White Paper on the Use of Numeric Tokens in Resource Descriptions
Submitted by Philip Coombs
Washington State Archivist
February 23, 2000
ABSTRACT
Resource descriptions employ the attribute-value pair concept to record its properties. Whether in HTML or XML, the current structure is to add qualifiers and extensions to this pair to explain the semantical parentage of elements used. This paper presents an alternate approach to the qualification of elements using numeric tokens.
ISSUE
During the past several years a quest has been on going to establish a semantic registry concept to record conceptual mapping between attribute sets in circulation. The objective was to create a table that presented a path for interoperability between semantically equivalent attributes. For example, Mapping between metadata formats by Michael Day at http://www.ukoln.ac.uk/metadata/interoperability/.
The issue with this 'external' mapping is that each element must be mapped to all other related elements in the entire world for international interoperability. A daunting task at best. The responsibility of maintaining such a registry would fall to a central bureau of a worldwide organization. Document Type Descriptions (DTDs) would contain mapping to target database elements for queries and responses. Schemas would explain the origins of the semantic set and its descriptions.
In short, this is 'external mapping'; reliance upon a crosswalk apparatus maintained external to each database.
By contrast, an 'internal mapping' methodology may be considered.
TOKENS DESCRIBED
The core concept to internal mapping is the use of a common metadata element set as a reference schema. The reference schema could be any of a dozen popular, ubiquitous attribute sets. The schema should cover common data element concepts, not every possible element. Such a list of concepts could be used as a common reference point for all elements.
Each attribute in an attribute-value pair could be qualified by the additional reference to a common schema and the appropriate, related data element concept. This would allow any local element name to be used as long as its common meaning (Data Element Concept) is also expressed.
Not all elements in any attribute set will be related to a similar DEC in the reference schema. But the common elements such as 'description', 'title', 'keywords' and 'date created' will be found in the schema. Cross-database queries using these common attributes will be possible, even across multilingual databases.
The responsibility for mapping related local terms to the central, common schema rests with the local database administrator. If the choice of like concepts is illogical, queries to that database will return false drops and incorrect references.
TOKENS AS APPLIED
In several states and the federal GILS program, the bib-1 attribute set was chosen as the reference schema. This schema is the foundation for Z39.50 stateful queries. By using the same foundation for non-Z39.50 data element queries, a possible cross-protocol option is presented.
The bib-1 schema URL is http://lcweb.loc.gov/z3950/agency/defns/bib1.html. Recently, the federal GILS has created a bib-1 to token number table at http://www.gils.net/elements.html Notice the assigned tag number. This is the working description that is sent during Z39.50 queries. This same tag number has value as a token (qualifier) during stateless exchanges in HTTP. The challenge has been to establish the best way to add this tag number as a DEC to a local attribute-value pair.
In addition, a name for numeric tokens was needed to identify it during crawling and parsing. Eliot Christian of the USGS suggested using simply "Z" and created a chart of GILS elements and their bib-1 origins at http://www.gils.net/elements.html).
Consider the following examples of numeric tokens:
In META tags:
<META NAME="dc.title" CONTENTS="The name of the object" Z="4">
<META NAME="titre" CONTENTS="The name of the object" Z="4">
In XML:
<?XML version "1.0"?>
<!DOCTYPE report SYSTEM "report.dtd">
<report>
<description>
<title>The name of the object
<Z>4</Z></title>
</description>
<text of the report>
XXXXXXXXXXXXXX
</text of the report>
</report>
FURTHER CONSIDERATIONS OF TOKENS
There are also discussions regarding the potential of combining Z numbers to create extensions / qualifiers below a core element.
For example:
Z="1161,1172;5" could convey that the element inherits the concept of Use-Constraints (1161) together with the concept of
Contact-Telephone (1172). Taken together, the resulting concept is "the telephone number to call about copyright restrictions".
GROWING SUPPORT OF NUMERIC TOKENS
A few vendors have expressed interest in the benefits of using numeric tokens as element qualifiers. One such vendor is Hiawatha Island Software of Concord, New Hampshire. They have developed an outstanding META tag embedding application that also supports tokens (http://www.hisoftware.com/Taggen-cmz.htm). Many federal GILS records have already been updated to carry tokens. For an example, view the META tags at http://www.gils.net/index.html.
In addition, a few gathering vendors have expressed interest in exploring tokens. Modifications to popular gathering applications such as Netscape Compass and Microsoft Site Server are under consideration. Testing has demonstrated that crawling / harvesting robots will not fail when encountering tokens in the resource descriptions.
CONCLUSIONS
The use of internally mapped elements to a common stable schema such as bib-1 will provide interoperability between databases, at least for commonly used attributes. Such an approach decreases the effort required to equate related data element concepts and is significantly easier to administer than externally mapped registries now being considered.
CONTACT
Philip Coombs, State Archivist
Office of Secretary of State, Archives Division
1129 Washington Street SE; Mailstop 40238
Olympia, Washington, USA 98504-0238
Voice: 1-360-586-2660
FAX: 1-360-664-8814
pcoombs@secstate.wa.gov