Chapter Four     |     Context and Rationale     |     Organisational Approach     |     Implementation Approach

Organisational Approach

Who are the individuals or actors involved in the publication and discovery of geospatial information? By defining the roles and responsibilities that these actors play, one can understand the essential functions that human or computer-assisted services should be able to conduct in the interest of resource discovery for the GSDI.



Figure 4.2 shows interactions between the Actors, the functions they perform, and the objects they interact with. The illustration uses Unified Modeling Language (UML) notation to picture processes from a functional point of view.

Figure 4.2 - Interactive diagram showing basic usage of catalog services and related SDI elements.
Using the actors from Figure 4.2 as described in the text, the following sections will describe the organisational or operational management requirements for distributed catalog services compatible with the GSDI based on the following areas of interest

Each section will include a Use Case to focus on the roles and actions that should be considered in creating a discovery component of your SDI.

Catalog Server/Service Development

The Distributed Catalog services assume some degree of distributed ownership and participation. Similar activities on the Internet have taken a fully centralised approach to metadata management by placing all metadata in an index on one server, or in several replicated servers. In an increasingly dynamic data management environment, the synchronisation between detailed metadata and such an index becomes increasingly difficult. This problem is experienced on a daily basis when conducting searches on Web search engines and getting a "404: File not found" error when a document has been moved or changed. In addition we are seeing a migration toward treating metadata and data as interrelated and even being managed together within a single database. To duplicate this metadata in an external index can be costly and invites problems with synchronisation of the data, its metadata, and the externally indexed metadata. Organisations who already manage spatial data and are interested in publishing it are often the most capable candidates for publishing and maintaining the metadata. Metadata co-located with data on a server tend to be more current and detailed than metadata published to an external index (harvested and indexed off-site).

The construction of a catalog service capability for geospatial information is built upon on the commitment to collect and manage some level of geospatial metadata within an organisation. The following Use Case scenario describes the publishing of a metadata entry.

  1. A contributor of metadata receives the description of a new spatial data set developed by other professional staff. This metadata is generated in a transferable encoding format to allow exchange of the metadata without loss of context or information content.
  2. This metadata entry is passed to a catalog administrator for consideration and loading to the catalog.
  3. The catalog administrator applies any acceptance criteria on the quality of the metadata as required by the organisation. If the metadata are acceptable it is inserted into the catalog.
  4. The catalog administrator then updates the catalog to reflect the new entry as available for public access.
  5. This data set is now considered advertised because its metadata provide a searchable and browseable record of its background, its temporal and spatial extent, and many other searchable characteristics.

There are several models for where Catalog services might be installed within or among organisations. Generally speaking, a catalog server is usually installed at the level of organisation appropriate to the nature of the data or metadata, the organisational context or mandates, and the level at which a catalog can be operationally supported.

Because of the nature of the distributed catalog and its ability to search many servers, all of the suggested models listed are equally viable. In fact, close reading of the model descriptions will show that they represent a continuum of organisational choices that vary in complexity, governance, and the degree of integration with the data being described.

Alternative Approaches

The operational design of a distributed catalog as described above, depends in large part on the ability for clients to use the proposed services. Globally, access to computers and communications networks supporting Web applications is still available to a small minority of the population. While this is changing in almost all regions through providing community public access points, building and subsidizing network construction and interconnection, the distributed catalog may not be well suited to conditions in many developed and developing countries where the Internet is not yet common or bandwidth is lacking. There are two solutions that have been prototyped and are suitable for public information access in such environments.

For organisations and clientele who have limited access to computers or networks, metadata can be reprocessed and printed and distributed as paper catalogs. Printing and distribution costs may be significant but a wide audience can be reached through public libraries and organisations interested in using spatial data in decision making. Synchronisation with current data content and holdings in such paper catalogs may also be an issue. Paper distribution of catalogs can always be considered a supplement to digital information service methods.

If Internet services are present and available to the public but network bandwidth within the region of interest is limited, individual catalogs may wish to support harvesting of metadata from remote sites in "mirror" catalogs. A good example of this would be for supporting regional data discovery across multiple servers in different locations whose connections are low-speed. If each catalog posted its metadata in a Web-accessible directory, a crawler or harvester program could retrieve and index metadata from other sites into a regional or replicate index. This methodology is being demonstrated in the United States to provide a single synchronized point of access to metadata that are fetched from a small to moderate number of sites. Note that this still suggests that the combined collection itself is still behind a server with a common interface, but potentially fewer standing servers are required in this architecture. At the extreme end of this design one could envision a few large metadata repositories with common search interfaces. Primary concerns about the scalability of this approach include supporting extremely large searchable metadata indexes and the synchronization of the indexes with remotely held metadata and data. It is not likely that this approach would scale to support a single global collection of metadata using current technologies.

In environments where both data providers and clients have access to computers but not reliable networks, the creation of CD-ROM or DVD media with searchable metadata (and perhaps even data) is another outreach mechanism. Creation of digital media with metadata and data will be of greatest benefit where standard metadata and data approaches are followed, and a catalog (software and data) could be placed on the media to minimise the cost of deployment where a catalog already exists.

These alternatives should be viewed as approaches that supplement the catalog services recommendations described in this Chapter until such time as the information is accessible to the majority of intended clients via the Internet. Use of the catalog services will immediately enable international academic, commercial, and governmental use of such information for regional analysis issues.

Catalog Gateway and Access Interface Development

Within a given geographic or discipline-based community, the need will exist to build relevant search capabilities that facilitate intuitive search across many servers. This problem can be divided into two related parts that must interrelate -- a user interface (Search/Browse Interface, fig 4.2) and a query distributor (Catalog/Gateway Portal, fig 4.2). When performed across the

Figure 4.3 - Configuration options for Gateway and User Interfaces to Distributed Catalog
Internet, these functions may be logically deployed in different locations although they tend to be coupled together in server-based or client-based search solutions.

Figure 4.3 shows the possible configurations of a catalog gateway and the user interface. Client A accesses a user interface that is downloaded (as forms or an applet) from a host on the Internet that is also managing multiple connections to servers. Client B is accessing a user interface from a location that is different from that of the Gateway supporting the construction of customised user interfaces for a community. Client C is a client-side "desktop" application that is fully self-contained and includes the user interface and distributed query capabilities for direct connection to remote servers. What is not known on this diagram is the dependence on or reference to a registry or Directory of Servers, as shown in Figure 4.2, which is further explained in the next section. All three styles of interaction are known to exist in various SDIs. Because they all depend upon distributed catalog servers the three approaches are fully compatible.

Two styles of interaction are known to exist in Web search interfaces that are equally well applied to distributed catalog access. The first style is query in which the user specifies search criteria for search using simple to advanced interfaces. The second style is a browse interface in which the user is presented with categories of information and selects paths or groupings, often in hierarchical form, to traverse.

The search approach to interaction with distributed catalogs can provide extra precision for advanced users in selecting spatial data of interest. It often is implemented in iteration to discover what effects individual parts of a query have on the pattern of results returned. The browse approach has great appeal to novice users who may wish to navigate by reference without knowing proper search words or fields a priori. The challenge of constructing and supporting browse mechanism across a global collection of servers is the work required in building and supporting a universal vocabulary for classification and its hierarchy or word space, known as an ontology. As this service lies at the intersection of many disciplines of interest, the construction of a single classification system is an extremely daunting and improbable task. Intelligent classification systems that are run externally on collections using neural networks, Bayesian probablitiies, and other estimates of "context" may be available in the coming years to help users navigate through heterogeneous geospatial information.

A Use Case scenario for a query user is as follows:

  1. A User uses client software to discover that a distributed catalog search service exists.
  2. User opens the user interface and assembles the query elements required to narrow down a search of available information.
  3. The search request is passed to one or more servers based on user requirements through a gateway function. The search may be iterative, repeating or refining queries based on new interactions with the user.
  4. Results are returned from each server and are collated and presented to the User. Types of response styles may include: a list of "hits" in title and link format, a brief formatting of information, or a full presentation of metadata. Visualisation of multiple results may also be available through display of data set locations on a map, thematic groupings, or temporal extent.
  5. User selects the relevant metadata entry by name or reference and selects the presentation content (brief, full, other) and the format (HTML, XML, Text, other) for further review.
  6. User decides whether to acquire the data set through linkages in the metadata. By clicking on embedded Uniform Resource Locators (URLs) the user can directly access online ordering or downloadable resources, whereas distribution information lists alternate forms of access.

A User Case scenario for a browse user is as follows:

  1. A User uses client software to discover that a distributed catalog search service exists. This may be done through a search of Web resources, a saved bookmark, reference from a referring page, or word-of-mouth referral.
  2. User opens the user interface and selects the parameters required to narrow down a search of available information based on topics/subjects, organisations, geographic location, or other criteria. Parameters are usually grouped into hierarchies for the user to navigate.
  3. Requests are made to each server through a distributed request mechanism.
  4. Results from each server are collated and presented to the User. Form of organisation of results is controlled by the user interface and gateway collaboration to present a uniform result space.
  5. User selects the relevant metadata entry by name or reference and selects the presentation content (brief, full, other) and the format (HTML, XML, Text, other) for further review.
  6. User decides whether to acquire the data set through linkages in the metadata. By clicking on embedded Uniform Resource Locators (URLs) the user can directly access online ordering or downloadable resources, whereas distribution information lists alternate forms of access.

Registering Catalog Servers

The nature of distributed catalogs requires that the knowledge of the existence and properties of any given catalog participating in a community be known to the community. In support of GSDI concepts, the need for a dynamic directory of catalog servers is ever more important. The directory of servers concept allows an individual catalog operator to construct and register service metadata with a central authority. This registry is then a searchable catalog in its own right so that software may discover suitable catalog targets based on their predominant geographic extent, descriptive words or classification, country of operation, or organisational affiliation, among other properties. Already national listings of compatible catalog servers have been built, but the operation of a global network of catalog servers within GSDI will require that a common directory of servers be built and managed to assure current content, distributed ownership, and authoritative reference to servers.

The features of the directory of servers include:

Several national distributed catalog activities support management services for server-level metadata and contain references to servers predominantly in their country. The GSDI has a responsibility to sponsor a common directory of servers registry for all countries to utilise, with delegation of authority made to participating countries to manage and validate host information for their servers. This follows the Domain Name Service model of the Internet and if implemented in a similar way, would assure scalability and ownership within the global community.

Relevant Standards

The GSDI distributed catalog has been designed with maximum reliance on existing technologies and standards. Because of this, existing software can be re-utilised or adapted to support geospatial information without requiring special investment in new technologies. Key standardisation efforts in access to catalogs are found in the ISO 23950 Search and Retrieve Protocol, the Catalog Services Specification recently passed within the OpenGIS Consortium, and relevant standards or "recommendations" of the World Wide Web Consortium (W3C).

ISO 23950, also known as ANSI Z39.50, is a search and retrieval protocol developed initially in the library community for access to virtual catalogs. Key features of the ISO 23950 protocol include:

The use of a generalised query protocol on ISO 23950 permits a migration from national forms of metadata to future forms being developed through international consensus under ISO Technical Committee 211 and their draft metadata standard 19115. Even though the metadata standard will change, the GEO Profile specifies the meaning of search fields in a way they can be mapped to multiple metadata schemas. Under the GEO Profile search of international metadata can be achieved today across collections in Europe (Global Environmental Locator Service, GELOS), the United States, Canada, Latin America, and Australia in a single search, even though different underlying metadata models exist.

The OpenGIS Consortium published a Catalog Services Specification in 1999 that provides a general model for geospatial data discovery through a catalog that includes management, discovery, and data access services. These general services are described for implementation in the OLEDB, CORBA, and WWW environments. The management functions include the ability to specify interfaces for creation, entry, update, and deletion of metadata entries to a catalog. The discovery functions include the ability to search for and retrieve metadata entries from a catalog with embedded references within the formal metadata to on-line data access, where available. The access functions support extended access to or ordering of spatial data based on references established in the metadata. Only the discovery functions are deemed mandatory in the Catalog Services implementations; guidance is provided for implementation of optional management and access (really ordering) in interoperable ways.

At the August OGC meeting in Southampton, U.K., a common catalog services approach was presented and demonstrated that built upon the essential search and retrieval model of ISO 23950. Implementation specifications were submitted for CORBA, OLEDB, and the Web. Distributed parallel search across these different protocols was demonstrated through an extension of commercially-available gateway software.

The Web Profile of the OGC Catalog Services Specification includes two implementation paths: one permits the implementation of existing ISO 23950 servers (on TCP/IP) and a second specifies the use of XML encoding of queries and responses over HTTP. The XML Encoding Rules (XER) approach was demonstrated at Southampton through client and server software developed by the Joint Research Centre of the European. Because the server is implemented on HTTP, metadata providers need only install the server and index software as part of their web server as a component or module. Firewall issues of using a different TCP port are minimised because all queries could use the web server's communication port.

CORBA and OLEDB implementations provide solutions for organisations that are already using these two technologies.

The International Standards Organisation (ISO) has a Technical Committee, TC 211, dedicated to the standardisation of abstract concepts relating to geospatial data, services, and the geomatics field in general. The draft metadata content standard (ISO 19115) provides a comprehensive vocabulary and structure of metadata that should be used to characterise geographic data. The development of national and discipline-oriented profiles of ISO 19115 will facilitate exchange of information using common semantics. ISO 19115 includes a recommendation on the encoding of metadata for exchange, using the encoding recommendation of TC 211, represented in XML format.

The World Wide Web Consortium (W3C) is a group of implementing organisations interested in developing common specifications, known as "recommendations' for wide support on the Web. One key set of recommendations and work items focus on the Extensible Markup Language (XML), a markup language specifically geared to encoding structured content of information. Companion topics include the XML-Schema activity, working on defining the schema and data types for XML documents and XML-Query -- at present only a design activity for a request syntax for XML-structured documents. The XML 1.0 Recommendation is in general use now, and is seeing wider application in the geographic software field as an increasingly richer means to encode and transfer structured information of all types.

Table of Contents

Chapter Four     |     Context and Rationale     |     Organisational Approach     |     Implementation Approach