Data managers tend to be either technically literate scientists or scientifically literate computer specialists. Creating correct metadata is like library cataloguing, except the creator needs to know more of the scientific information behind the data in order to properly document them. Don't assume that every professional needs to be able to create proper metadata. They may complain that it is too hard and they may not recognise the benefits. In this case, ensure that there is good communication between the metadata producer and the data producer; the former may have to ask questions of the latter to collaboratively develop adequate documentation.
The form for maintaining metadata will depend on a number of factors:
Indeed many organisations will start with a basic audit of their data holdings that will alert them of the vast wealth of data that they possess and where it is being used, replicated or improved across the organisation. As the data holdings become larger and the access to the data becomes distributed, then organisations would look at more advanced methods for maintaining metadata of their data holdings. These advanced tools may consist of commercial or self-developed forms based systems that may also form part of the operational GI systems to extract aspects of the metadata automatically from the data itself.
How does one deal with people who complain that it's too hard? The solution in most cases is to redesign the workflow rather than to develop new tools or training. People often assume that data producers must generate their own metadata. Certainly they should provide informal, unstructured documentation, but they may not need to go through the rigors of fully structured formal metadata. For scientists or GIS specialists who produce one or two data sets per year it may not be worth their time to fully learn a complex metadata standard. Instead, they might be asked to fill out a less- complicated form or template that will be rendered in the proper format by a data manager or cataloguer who is familiar (not necessarily expert) with the subject and well-versed in the metadata standard. If twenty or thirty scientists are passing data to the data manager in a year, it is worth the data manager's time to learn the complex metadata standard. With good communication, this strategy complements the existing combination of software tools and training.
The metadata standard
The first data set documented is always the worst. The other aspect to "It's too hard" is that documenting a data set fully requires a (sometimes) uncomfortably close look at the data and brings home the realisation of how little is really known about its processing history.
"Insufficient time" to document data sets is also a common complaint. This is a situation in which managers who appreciate the value of GIS data sets can set priorities to protect their data investment by allocating time to document it. Spending one or two days documenting a data set that may have taken months or years to develop at thousands of dollars in cost hardly seems like an excessive amount of time.
These 'pain' and 'time' concerns have some legitimacy, especially for agencies that may have hundreds of legacy data sets which could be documented, but for which the time spent documenting them takes away from current projects. At this point in time, it seems much more useful to have a lot of 'shortcut' metadata rather a small amount of full-blown metadata. So what recommendations can be made to these agencies with regard to a sort of 'minimum metadata' or means to reduce the documentation load?
In some operations, small amounts of metadata, or "notes" are collected sporadically during the data processing flow. These hints can then be assembled more readily later into a clear statement of the history and processing of the dataset. This can present a less daunting task at the end of a project as most of the details are already documented, a little at a time. Increasingly, GIS and image processing software are capable of collecting and reporting quantitative metadata that can be filled-in for the user rather than expecting human input. These procedures can amount to significant savings in overall time and effort over a single manual metadata preparation process at the conclusion of a project.
Don't invent your own standard. Select a supported international standard wherever possible. Try to stay within its constructs. Subtle changes from an international standard such as collapse of compound elements may be costly in the long run - you won't be able to use standard metadata tools and your metadata may not be directly exchangeable or paresable by software.
Don't confuse the metadata presentation (view) with the metadata itself. There is a temptation to lump form and content into the same bin (e.g. "What I see in my database is what I print"). However, the ability to differentiate the contents of the metadatabase (the columns or fields) from its presentation (writing formatted reports) is now commonplace in desktop database software packages. This allows users to consider more flexibly how to present what information.
There are typically three forms of metadata that should be recognized and supported in systems: the implementation form (within a database or software system), the export or encoding format (a machine-readable form designed for transfer of metadata between computers), and the presentation form (a format suitable to viewing by humans). By recognizing the connections between these dispositions of metadata, one can build systems that support mission requirements, standard encoding for exchange, and permit many "report" views of the metadata to satisfy the needs and experience of different user constituencies.
The Extensible Markup Language (XML) provides two solutions to this metadata problem. First, it includes a capable markup language with structural rules enforced through a control file to validate document structure. Second, through a companion standard (XML Style Language, or XSL), an XML document may be used along with a style sheet to produce standardised presentations of content, allowing the user to shuffle field order, change tag names, or show only certain fields of information. Used together XML and style sheets allow for a structured exchange format and for flexible presentation. Thus, a metadata entry can be rendered in many ways from the same, single structured encoding.
XML is a widely accepted encoding methodology with international software support. It is supported by a lot of software, both free and commercial. However, the metadata-producing community doesn't have much experience using it to solve problems yet. Through reference implementations of software and experimentation, local Spatial Data Infrastructures can share their successes and failures in applying this new technology to fullest community benefit.
Consider data granularity. Can you document many of your data sets (or tiles) under an umbrella parent? Prioritise your data. Begin by documenting those data sets that have current or anticipated future use, data sets that form the framework upon which others are based, and data sets that represent your organisation's largest commitment in terms of effort or cost.
Document at a level that preserves the value of the data within your organisation. Consider how much you would like to know about your data sets if one of your senior GIS operators left suddenly in favour of a primitive lifestyle on a tropical island.
How do I create metadata?
First, one should understand both the data you are trying to describe and the standard itself. Then one must decide how you to encode the information. Historically, one creates a single text file for each metadata record; that is, one disk file per data set. Typically a software program is used to assist the entry of information so that the metadata conform to the standard.
Specifically:
The various metadata standards are truly content standards. They do not dictate the layout of metadata in computer files. Since the standard is so complex, this has the practical effect that almost any metadata can be said to conceptually conform to the standard; the file containing metadata need only contain the appropriate information, and that information need not be easily interpretable or accessible by a person or even a computer.
This rather broad notion of conformance is not very useful. Unfortunately it is rather common. To be truly useful, the metadata must be clearly comparable with other metadata, not only in a visual sense, but also to software that indexes, searches, and retrieves the documents over the Internet. To accomplish this, there are several encoding standards that specify the content of a metadata entry for exchange between computers, For real value, metadata must be both parseable, meaning machine-readable, and interoperable, meaning they work with software used in services such as the FGDC Clearinghouse through OpenGIS Catalogue Services. Both the FGDC and the ISO 19115 efforts have encoding standards to assist in this effort.
Parseable
To parse information is to analyse it by disassembling it and recognising its components. Metadata that are parseable clearly separate the information associated with each element from that of other elements. Moreover, the element values are not only separated from one another but are clearly related to the corresponding element names, and the element names are clearly related to each other as they are in the standard.
In practice this means that your metadata are usually arranged in a hierarchy, just as the elements are in the standard, and they must use standard names for the elements as a way to identify the information contained in the element values.
Interoperable
To operate with metadata service software, your metadata must be readable by that software. Generally this means that they must be parseable and must identify the elements in the manner expected by the software.
There is a general consensus that metadata should be exchanged in Extensible Markup Language (XML) conforming to a Document Type Declaration (DTD). In the World Wide Web Consortium, there is progress on developing successor to the DTD, known as XML-Schema. Support for XML in parsing and presentation solutions is widespread on the Web and is presumed in current draft standards of the ISO TC 211 and OpenGIS specifications.
What software is available to create and validate metadata?
No tool can check the accuracy of metadata. Moreover, no tool can determine whether the metadata properly include elements designated by the Standard to be conditional, or 'mandatory if applicable.' Consequently, some level of human review is required. But human review should be simpler in those cases where the metadata is known to have the correct syntactical structure.
Software cannot be said to conform to the Standard. Only metadata records in a given encoding form can be said to conform or not. A program that claimed to conform to the Standard would have to be incapable of producing output that did not conform. Such a tool would have to anticipate all possible data sets. Instead, tools should assist you in entering your metadata, and the output records must be checked for both conformance and accuracy in separate steps. At best one can describe or anticipate compatibility testing among software components.
When searching for information, the inquirer may not find any references based on the words used to describe the information sought. This problem can be overcome by use of a thesaurus. In the context of metadata and other electronic documents, a thesaurus is a tool for the organisation and retrieval of information in electronic materials. It allows data to be indexed and retrieved in a consistent manner. It permits the display of hierarchies of concepts and ideas, leading the user, whether as indexer or information seeker, to define his or her search in terms that are most likely to lead to the retrieval of relevant information.
For example, it will allow improved information retrieval by providing successful searching on synonyms - if the user enters the term "farming" the thesaurus will find the term "agriculture". Hierarchies of meaning can be shown - the term "Great Britain" may retrieve data indexed with that term but could also expand the search to retrieve data on England, Wales and Scotland which have been indexed under those three terms. The term "meals on wheels", although in a hierarchy of terms related to food, could also be linked to concepts relating to personal social services and to the different categories of recipients and a user can elect to follow and retrieve these related terms. Consistent searching for metadata will be achieved if all those who prepare metadata use the same thesaurus.
Minimum collaboration with users during the definition and implementation phases: a user-friendly focus is needed
For a non-professional user, finding the information wanted is very difficult. Even if 'Help' or 'Tutorial' can be found in some metadata services, it is not very easy to understand what to do and where to type. Efforts must be made to explain what to ask for and to develop user-friendly and multi-lingual interfaces. If it takes too much time to understand how to react to metadata services, users will not stay long and will immediately complain! A dictionary, multilingual thesauri or catalogues with keywords, should be provided to users to ensure that the same vocabulary is used. One of the most important things is to develop services that are not technology dependent and technology driven. Projects must be done in collaboration with users (who must first be identified).
User-expected content
Given the complex metadata models deployed, we can be reasonably certain that the metadata that is now presented from catalogue services is almost always more than is expected by end users. It seems that the current tendency is to propose a complex database approach that seems to be very 'data producer oriented'. One can imagine that users are more interested in examples and benefits on how to use the proposed data sets than a detailed description of its structure and content. This can be accomplished through special presentations of metadata.
It is important to separate the content of spatial metadata with its means of presentation. Through applications such as the Extensible Markup Language (XML), documents with extensive detail can be rendered through different style sheets from one content source into many presentation forms suitable to different audiences. Further work on developing presentation methodologies is required to simplify the burden of understanding metadata by all.
Metadata for applications
There is a tendency to adapt the metadata structure and content to applications, for example, electronic commerce or data management within an organisation. Metadata that is created to satisfy a real need, rather than because it is seen as something that should be done in the general interest, is more likely to be well-written and maintained.
The OpenGIS Consortium is developing metadata structures and fields to describe software interfaces, exposed as "services" for external use. This services metadata will help intelligent software, through brokers known as service catalogues, to discover available services that could ultimately be chained together to form new composite operations. Services also have necessary links to data classes and instances. The OGC Web Mapping Testbed is documenting this interaction as a contribution to metadata in ISO 19115.
A geographic information product identification mechanism
There is no current mechanism to provide identification numbers (Id) to the different GI products produced and offered to users. This missing element is a very important issue for those who are implementing in parallel a metadata service and an e-commerce solution.
To make the e-commerce of GI a reality a study on how a GI numbering system could be organised and implemented and by whom should be made. This system could be similar to the ones used for other products, such as books. It would be extremely helpful if the Global Spatial Data Infrastructure activity could develop initial guidance on the technical and political issues involved in establishing a data product identifier system that will work globally on digital and non-digital geospatial information.
Incentives for metadata development
The impressive list of incentives which includes financial resources, knowledge and expertise, standard and tools provided by the FGDC (Federal Geographic Data Committee - http://www.fgdc.gov) to stimulate the creation and maintenance of metadata content and services within the concept of the US Clearinghouse appeared to be a key success factor of the US metadata initiative. It is important that national and regional governments evaluate, recognize, and provide such incentives to metadata builders and managers. Some have started - France, Canada, Australia, the United States and other countries develop and provide free software and to metadata builders. It is anticipated that the widespread adoption of the ISO 19115 metadata standard will further encourage the development of an international base of free and commercial tools around a common standard.
Envisage legislation for public sector metadata content
In countries where legislation is the main engine for creating new or adapting existing public sector activities, new laws may be needed to encourage or require the collection and distribution of standards-based metadata by the GI public sector and by commercial enterprises that collect geospatial data for the public sector.
Recommendations
References and Links
A digression on conformance and interoperabilityIssues in Implementation
Vocabularies, gazetteers and Thesauri
Standards are very expensive to create and build implementations for. National standards should be adopted with the intention of supporting the ISO 19115 metadata content standard when it becomes available. This will provide the greatest interoperability rewards in a global environment.
Begin by documenting those data sets that have current or anticipated future use, data sets that form the framework upon which others are based, and data sets that represent your organisation's largest commitment in terms of effort or cost. Framework layers and special, unique layers of great interest should be adequately documented for use within your organisation and by those on the outside. Of course, all published data warrant documentation this way, but through setting priorities you will know what work you have ahead of you.
For detailed metadata such as FGDC and ISO, an enormous amount of possible information can be collected. Although all fields are never filled in, it provides an opportunity to store specific properties in their correct location within the standard structure. This facilitates their storage and discovery in catalogues (See Chapter 4). If certain types of metadata are collected during the data collection process as part of the current workflow, then many 20-second notes can amount to a substantial story later on. This type of information cannot be easily collected after the fact.
The GSDI Technical Working Group with policy assistance from the Steering Committee should develop initial guidance on the technical and political issues involved in establishing a data product identifier system that will work globally on digital and non-digital geospatial information.
Whereas ISO TC 211 is developing general specifications and methodologies, and the OpenGIS Consortium is building software interfaces, no convened global organisation is known to be co-ordinating a common classification system for geospatial data. As a result, the use of competing thematic thesauri make distributed search difficult.
Table of Contents
Chapter Three | Context and Rationale | Organisational Approach | Implementation Approach