Chapter Four     |     Context and Rationale     |     Organisational Approach     |     Implementation Approach

Implementation Approach

The development of operational distributed catalog services has been taking place in a number of countries including the United States, Canada, Mexico, Australia, and South Africa as primary examples. The software systems used to implement the ISO 23950 and Web based services has been developed largely through governmental support, resulting in both open source and commercial software solutions. The evolution of protocols and industry practices are difficult to predict, but this section provides a review of available solutions.

Let's review a technical use case scenario for access to a distributed catalog:

  1. A User uses client software to discover that a distributed catalog search service exists. This may be done through a search of Web resources, a saved bookmark, reference from a referring page, or word-of-mouth referral.
  2. User opens the user interface and assembles the parameters required to narrow down a search of available information.
  3. The search request is passed to one or more servers based on user requirements through a gateway service. The search may be iterative, repeating or refining queries based on new interactions with the user.
  4. Results are returned from each server and are collated and presented to the User. Types of response styles may include: a list of "hits" in title and link format, a brief formatting of information, or a full presentation of metadata. Visualisation of multiple results may also be available through display of data set locations on a map, thematic groupings, or temporal extent.
  5. User selects the relevant metadata entry by name or reference and selects the presentation content (brief, full, other) and the format (HTML, XML, Text, other) for further review.
  6. User decides whether to acquire the data set through linkages in the metadata. By clicking on embedded Uniform Resource Locators (URLs) the user can directly access online ordering or downloadable resources, whereas distribution information lists alternate forms of access.

The Distributed Catalog is implemented using a multi-tier software architecture that includes a Client tier, a middleware or "Gateway" tier, and a server tier, as is illustrated in Figure 4.4. The client tier is realised by a traditional Web browser or a native search client application. The Web browser uses conventional HyperText Transport Protocol (HTTP) communications, whereas the native search client uses the ISO 23950 protocol directly against a set of servers. It is possible to also collapse this multi-tier architecture into two tiers where middle-tier functionality is present in the client. A commercial Java-based Distributed Catalog client, "Meta-Data Browser," was released in September 1999 by MapInfo to provide desktop access to Catalog servers through a map and tab based search design.

Figure 4.4 -- Implementation view of distributed catalog services

The middle tier in the architecture includes a World Wide Web to catalog services protocol gateway. A Gateway effectively converts an HTTP POST or GET request into multiple catalog service clients that run either in series or in parallel. Gateway solutions provide parallel distributed search of multiple catalog servers from a single client Web session. At present, Gateways have been installed in the U.S., Canada, Mexico, South Africa, Australia to provide regional points of access. The forms and interfaces installed at each are identical, and each hosts parallel search of all servers. In order to track a large number of Distributed Catalog servers, a list of known, compatible servers called a Directory of Servers or Registry must also be managed. This service contains server or collection-level metadata that can itself be searched as a special catalog. In this way, an intelligent one pass search of eligible servers can be performed instead of requiring the user to select servers from a list, or to have all queries passed to all servers.

At the bottom tier of the service architecture are the catalog servers. These servers can be accessed using the GEO Profile of the ISO 23950 protocol, although CORBA and OLEDB specifications also exist but are not available as of February 2000. The GEO Profile of ISO 23950 is available to implementors in the geospatial community as an extended set of the traditional bibliographic fields that can be searched. GEO includes geospatial coordinates (latitude and longitude) and temporal fields in addition to free-text (e.g. search for the word anywhere in the metadata entry). ISO 23950 servers may be implemented on top of XML document databases, object-relational, or relational database systems in which structured metadata are stored for search and presentation.

The ISO 23950 protocol was selected for use in the Distributed Catalog for several reasons. First, the library catalog service community existed with relevant software and specifications that could be enhanced for geospatial search. By adopting compatible terms, library catalogs can be searched with GEO catalogs. Second, the ISO 23950 protocol specifies only client and search behavior and does not specify the native data structures or query language used to manage the metadata behind the server. Abstraction of query allows for a public query on "well known" fields that can be translated at each server into local equivalents. This lets one keep current database structures and names but supports alternative access through this geospatial public "view," expressed in XML or HTML reporting forms. This common search functionality across hundreds of servers is a prerequisite to distributed search. It allows for local database management autonomy yet supports federated search. Third, the protocol is independent of computer platform. ISO 23950 search clients and servers exist for many types of UNIX and Windows platforms, and Java libraries are available for additional client and server programming.

This separation between local and public metadata search fields has allowed for the ISO 23950 search of many different types of metadata collections that support the GEO Profile, even though they may not support the same metadata model. For example, The Australia and New Zealand Land Information Council (ANZLIC) metadata contains different tag names than FGDC metadata in the US. Through standard translation tables in the server, search against ANZLIC's "Data Set Name" field is associated with "Title" (the query labels this as attribute number 4) in the registered public fields. As a result, Australian catalog servers can be searched through the FGDC Clearinghouse Gateways but return metadata records of a different structure. The same approach could be applied to other community metadata services, such as those employed by the Directory Interchange Format (DIF) files used in the space and global change disciplines or other metadata standards with similar content. Ideally, metadata formats should be delivered in such a structure that they could be converted or translated for consistent presentation, even if they come from different communities. The Extensible Markup Language (XML) and translator software is starting to enable the transformation of different XML documents in different schemas.

Catalog Server/Service Development

To encourage widespread participation in the Clearinghouse, catalog service software has been developed under direction of the FGDC and other coordination organisations around the world. Reference implementations of software exist to provide a free or low-cost example of metadata management and Distributed Catalog service that can be quickly implemented. The software can also be used as reference by commercial developers to test anticipated functionality and interoperability and to develop value-added products.

A catalog service that participates in a distributed catalog should fulfill the following requirements:

Available Software Implementations

The Isite software suite is a reference implementation of the Catalog server that includes an XML document database and an ISO 23950 server supporting the GEO Profile for use on Windows and UNIX platforms. The U.S. Federal Geographic Data Committee is one of several sponsors that continue to support the development of this open-source software code. Isite supports document types conforming to the ANZLIC (Australia/New Zealand), Directory Interchange Format (DIF), Federal Geographic Data Committee's (FGDC) Content Standard for Digital Geospatial Metadata and is used in a number of countries that support these content standards. As soon as ISO 19115 is available as Draft International Standard, support of an ISO document type can begin for Essential (core metadata) and Full Profiles within the Isite package.

Several commercial catalog services supporting the OpenGIS Consortium Catalog Services Specification Version 1.0 Web Profile via ISO 23950 are available on the market today. Links to known commercial solutions are posted on the Federal Geographic Data Committee web site (http://www.fgdc.gov/clearinghouse). When Version 1.1 of the OGC Catalog Services specification is released and conformance testing methodologies are available, validated OGC-compliant software will also be listed from the OpenGIS web site (http://www.opengis.org).

Catalog Gateway and Access Interface Development

As depicted in Figures 4.3 and 4.4, there is often a need for an intermediary to provide application integration for an end user. Known as "application servers" or middleware, these hosts allow for the storage, construction, and download of user interfaces to end users and communicate with multiple catalog servers simultaneously -- a feat not supported by many web browsers due to security settings. http://www.opengis.org

Software systems, such as application servers, that integrate catalog search and other GIS and mapping functions benefit from the community development of software development kits (SDKs) based on standards. SDKs can provide client and server libraries for catalog search and other services based on standard interfaces. Through component architecture, these SDKs expedite development of advanced software by combining appropriate pieces of software together as needed, reducing the need for a programmer to learn the intricacies of a given service.

A UNIX-based reference implementation gateway from the World Wide Web to multiple ISO 23950 targets is available for non-commercial use from IndexData in Denmark, known as ZAP (http://www.indexdata.dk). A perl-based programming client library to ISO 23950 is also available from the Joint Research Centre in Italy (http://perlz.jrc.it/download). A Java-based distributed search module to multiple ISO 23950 targets from common web servers is also being commissioned as open source software by the US FGDC as is a client-side Java library.

Registering Catalog Servers

The operation of a growing network of distributed catalog servers requires the management of server-level information in a central location. This registry server, shown in Figure 4.4, essentially houses server or collection-level metadata for search and retrieval and use in distributed query. In this way a search may be first made of the registry of servers to identify candidate servers to target the query, and as a broker, the registry returns the list of likely targets based on criteria such as geographic and temporal extent and other search limits. A registry facility greatly improves the scalability of a national, regional or global network of catalogs.

In the context of the GSDI, a coordinated registry of catalog (and other) services is needed. If all catalogs were registered into a common and distributed registry akin to the way the Domain Name System (DNS) works, resolution of appropriate hosts of geospatial information globally will be enabled.

Based on activities within the Web Mapping Testbed of the OpenGIS Consortium (See Chapter 5), a catalog service for services and "services metadata" are being defined and prototyped by participating organisations. The FGDC also hosts an XML and Isite-based registry prototype generated from an Access database. This will be replaced with a conformant OpenGIS Catalog solution supporting ISO metadata in the coming year (http://clearinghouse4.fgdc.gov/registry/). A coordinated registry between the U.S. and Canada is proposed through an interagency agreement between the FGDC and Geomatics Canada.

Recommendations

The use of this specification, and in particular the Web Profile (ISO 23950) has increasing support from information locator activities on the Web. Existing reference implementation software allows organisations to participate at a very low cost; commercial implementations allow organisations to grow their collections and applications.

The operation of a GSDI service registry is not within the scope of an individual national organization or consortium such as OpenGIS. The GSDI is the rightful host for a service registry and a policy forum for adjudication of the policies associated with such a registry.

References and Linkages

Catalog Services Specification Version 1.0, 1999, Open GIS Consortium,
(http://www.opengis.org/techno/specs.htm#implementation)

Z39.50 International Standard Agency Home Page,
(http://lcweb.loc.gov/z3950/agency/)

Table of Contents


Chapter Four     |     Context and Rationale     |     Organisational Approach     |     Implementation Approach