- From: Ronald E. Daniel <rdaniel@acl.lanl.gov>
- Date: Fri, 9 Jun 1995 06:58:54 -0600
- To: uri@bunyip.com
4 Default Attribute Set
The use of inherited attribute set grammars provides a very general
capability, and there are applications that can use that generality to
improve their functionality and scalability. However, the primary
purpose of the URC service is URN to URL resolution, and the speed of
that resolution is a major concern for the URC service. Response
times for interactive browsing do not allow multiple network accesses
in order to fetch DTD fragments and build a grammar for parsing a URC.
Therefore, we provide a default attribute set whose semantics are to
be broadly understood. If a URC comes in that has been described
using this attribute set, then it can be parsed in either a formal
(according to ISO 8879) or a heuristic (anything else) fashion. If
a URC comes in with no attribute set specified, then the default
attribute set is assumed. Furthermore, if a URC comes in with a
different attribute set, we mandate that any elements it has with
the same names as elements in the default attribute set must have
similar semantics to the elements of the default set. This will
allow heuristic parsing of new attribute sets. To amplify this last
point, consider the TITLE element. Anyone creating a new attribute
set that contains TITLE must use it in a fashion similar to that in
the default attribute set. If someone wants to use TITLE for a very
different purpose (describing royalty perhaps), they must add special
information (scheme, type, or other attributes) so that it is possible
to tell that TITLE is being used in a novel fashion and simple parsers
are not led astray.
For the purposes of this draft of the specification, and the
prototypes develop0ed from it, the AID for the default attribute set
shall be:
<urn:x-dns-2:uri.acl.lanl.gov:default-dtd>
It is anticipated that as this specification moves to the standards
track, IANA will provide the URN for the default attribute set.
The default attribute set is strongly based on the work of the Dublin
metadata workshop [4]. A brief overview is given below. The
attribute set is based on a few principles. First, we provide a set
of elements with widely understood semantics (such as Title). Second,
we provide a mechanism for specifying more precise interpretations
of those elements (such as Subject (scheme=LCSH) Computer Science).
Third, we allow a variety of transfer syntaxes for the attributes.
Plain text, HTML, binary encodings, etc. can all be used, as
described further in section 5. All the elements in the default
attribute set are optional and repeatable. No particular order is
required.
Identifier: String or number used to uniquely identify this object.
Ron Daniel [Page 8]
INTERNET-DRAFT An SGML-based URC Service June 7, 1995
Typically this will be the URN for the resource. Other
identifiers may also be included.
Instance: Instance is a construct for grouping information on
particular instances of a resource, such as location, format,
price, etc.
URL: The URL element contains a URL that can be used to retrieve an
instance of the resource.
Author: The person(s) and/or organization(s) primarily responsible
for the intellectual content of the work.
Title: The name of the object.
Subject: The field of knowledge to which the work belongs.
Publisher: The agent or agency responsible for making the object
available.
Date: The date of publication.
Other Agent: Other person(s) and/or organization(s), such as editors,
transcribers, sponsors, etc. who have made significant
contributions to the work. Author and Publisher are special cases
of OtherAgent.
Object type: The abstract category of the object, such as novel,
poem, dictionary.
Form: The particular manifestation or data representation of the
object, such as PostScript file or Windows executable. For URCs,
form will typically be specified as an Internet Media Type -
formerly known as the MIME Content-type.
Relation: Relationship to other objects. This element should
identify the role of the relationship, as well as the related
objects.
Source: Objects, either print or electronic, from which this resource
was derived. This is a special case of the Relation element that
is believed to be widely useful to the humanities.
Language: Natural language of the intellectual content.
Coverage: The spatial locations and temporal durations characteristic
of the object.
Ron Daniel [Page 9]
INTERNET-DRAFT An SGML-based URC Service June 7, 1995
5 Multiple Syntaxes
Several of the URC requirements are difficult, if not impossible,
to satisfy at the same time. For example, it is a requirement to
have a human-readable, printed representation of a URC. It is also
a requirement that URCs have a consistent encoding that is suitable
for digital signature computation. Unfortunately, end-of-line
convention differences between platforms make it difficult to meet
both requirements simultaneously. A related problem is that there
will be different uses for URC information, and different encodings
will be appropriate to meet those different needs. For example, it
might be useful to obtain citations in some format (Bib TeX perhaps)
for inclusion in another system.
Because of these considerations, we make another fundamental
assumption:
There is no universally applicable syntax for a URC.
Because of this assumption, this specification calls for the use of
format negotiation mechanisms, specifically the Accept: header in
HTTP [?], to indicate preferred syntaxes. Depending on the user's
intent for the URC, they can ask for an HTML encoding, a plain text
encoding, an encrypted binary encoding, a synthesized audio rendition,
etc.
A set of examples are provided that show the requests and responses
for different renditions of the same underlying information. All of
these examples assume we are resolving the URN:
urn:dns:pchs.k-12.okc.ok.us:student-papers-1995/geo3
5.1 Example 1: text/html
When the URC service is first deployed, there will be a large base of
existing web browsers that will not have the ability to parse URCs.
One means of remaining compatible with these browsers is to request
the URC in HTML. The user can look at information on the different
locations for a resource, pick a site, and click on a link to retrieve
a resource.
The URN resolution request sent to the URC server's HTTP daemon might
be:
Ron Daniel [Page 10]
INTERNET-DRAFT An SGML-based URC Service June 7, 1995
GET student-papers-1995/geo3 HTTP/1.0
Accept: text/html, */*; q=0.2
which says to send text/html if possible, anything else if not.
The reply from the server might be:
HTTP/1.0 200 OK
Date: Tuesday, 08-Oct-96 21:09:16 GMT
Server: Apache/2.7
MIME-version: 1.0
Content-type: text/html
Last-modified: Friday, 26-Apr-96 21:57:12 GMT
Content-length: 129
<html>
<head>
<title>URC for urn:x-dns-2:pchs.k-12.okc.ok.us:student-papers-1995/geo3
</title>
</head>
<body>
<h1>urn:x-dns-2:pchs.k-12.okc.ok.us:student-papers-1995/geo3
</h1>
Author: Smith, Fred<br>
Title: Wanker! : A vicious, seditious, and tendentious his-
tory of George III<br>
Subject: American Revolution
Subject: (In)famous crackpots of history
Location: <A HREF=
"http://www.pchs.k-12.okc.ok.us/student-papers/1995/smith/geo3.html">
http://www.pchs.k-12.okc.ok.us/student-papers/1995/smith/geo3.html</a><br>
Form:text/html<br>
</body>
</html>
5.2 Example 2: text/urc0
The text/html example above provides one means for backward
compatibility with legacy clients. However, the "click twice" model
is going to get old really quickly. A CCI helper application could
be constructed that would parse URCs in some trivial format, such as
text/urc0 [3], and automatically pick a URL.
GET student-papers-1995/geo3 HTTP/1.0
Accept: text/urc0, */*; q=0.2
Ron Daniel [Page 11]
INTERNET-DRAFT An SGML-based URC Service June 7, 1995
HTTP/1.0 200 OK
Date: Tuesday, 08-Oct-96 21:09:16 GMT
Server: Apache/2.7
MIME-version: 1.0
Content-type: text/urc0
Last-modified: Friday, 26-Apr-96 21:57:12 GMT
Content-length: 62
=====
http://www.pchs.k-12.okc.ok.us/student-papers/1995/smith/geo3.html
5.3 Example 3: text/sgml
This example assumes the URC is prepared according to the default
attribute set. Note that the document declaration is included in the
response, and that a URN is used in the SYSTEM identifier. This is
how the attribute set is indicated for the SGML syntax. This is
somewhat contrary to the intent of SGML, which wants to use SYSTEM
to locate local information, and PUBLIC for well-known, publicly
accessible information. I have gone against that convention, since
SGML PUBLIC identifiers have a restricted character set, perform some
processing on the characters, and encourage a syntax (known as a
formal public identifier) that has structure likely to be incompatible
with URNs. By using a SYSTEM identifier we get a more liberal
character set, and the string is handed off to an ``entity manager''
for processing so as to fetch the appropriate file. Using a URN here
will require an enhanced entity manager, but such things are already
part of some commercial SGML products.
The URN resolution request sent to the URC server's HTTP daemon might
be:
GET student-papers-1995/geo3 HTTP/1.0
Accept: text/sgml, */*; q=0.2
The reply from the server might be:
HTTP/1.0 200 OK
Date: Tuesday, 08-Oct-96 21:09:16 GMT
Server: Apache/2.7
MIME-version: 1.0
Content-type: text/sgml
Last-modified: Friday, 26-Apr-96 21:57:12 GMT
Content-length: 129
<!DOCTYPE URC SYSTEM "urn:x-dns-2:uri.acl.lanl.gov:default-1-dtd">
Ron Daniel [Page 12]
INTERNET-DRAFT An SGML-based URC Service June 7, 1995
<urc>
<identifier scheme="URN">
urn:dns:pchs.k-12.okc.ok.us:student-papers-1995/geo3
</identifier>
<author>
Smith, Fred
</author>
<title>
Wanker! : A vicious, seditious, and tendentious history of George III
</title>
<Subject>American Revolution</subject>
<Subject>(In)famous crackpots of history</subject>
<instance>
<URL>
http://www.pchs.k-12.okc.ok.us/student-papers/1995/smith/geo3.html
</URL>
<form scheme="IMT">text/html</form>
</instance>
</URC>
The examples above have suggested particular syntaxes. Determining
the set of ``well-known'' syntaxes that all URC servers must be able
to emit is one direction this standard could be extended.
6 Query Languages
Diversity is the hallmark of the URC service. We want to encourage
the formation of a variety of value-added services, therefore we need
a standard means of dealing with unique capabilities. For example,
different search services will have different information and support
different forms of queries. This section defines a trivial query
language that all URC servers and clients must support in order to
claim conformance with the standard. This section also describes the
scheme by which non-standard queries can be identified.
6.1 Trivial Query Language
All queries in the trivial query language are HTTP GET requests. The
simplest query is to ask for the URC of a resource by providing
its URN. Note that the HTTP spec [?] says that the protocol and
hostname parts of a URL are assumed to be known. This is not a valid
assumption for URNs. URN. The server will return the full URC for the
resource, in whatever format has been negotiated.
Ron Daniel [Page 13]
INTERNET-DRAFT An SGML-based URC Service June 7, 1995
A few other queries are defined to obtain information on the resolver,
resources, etc. These are also HTTP GET requests. Where the request
is to get server information, the FQDN shall be that of the server.
They all begin with the reserved string "urn+". The requests and
their meaning are:
o urn+m Resolver meta-information
(e.g. <urn:x-dns-2:FQDN:urn+m>)
Returns a URC indicating URLs where resolver metainformation,
such as the administrative contact, sponsoring organization,
publication policy, etc. can be retrieved.
o urn+a list of All RequestIDs
(e.g. <urn:x-dns-2:FQDN:urn+a>)
Returns a URC indicating URLs where a list of all URNs provided by
the publisher can be obtained. The request must be understood by
all resolvers, but the response may have zero URLs in it. In such
a case the URC should contain a message saying the equivalent of
``sorry''.
o urn+c Child naming authorities
(e.g. <urn:x-dns-2:FQDN:urn+c>)
Returns a URC indicating URLs where a list of any child naming
authorities licensed by the naming authority associated with the
FQDN can be found.
o urn+p Parent naming authorities
(e.g. <urn:x-dns-2:FQDN:urn+p>)
Returns a URC indicating URLs where a list of any parent naming
authorities licensing the naming authority associated with the
FQDN can be found.
o urn+aids
(e.g. <urn:x-dns-2:FQDN:urn+aids>)
Returns a URC indicating the URLs that can be used to fetch a list
of all the attribute sets used to describe information on this
server.
6.2 Query Language Identification
This relatively minimal set of queries does not allow the construction
of complex queries, such as "gimme the URNs and titles of all
resources with 'Smith' as an author and 'food' as the subject".
Different organizations will wish to add value to their URC
collections in different fashions, so it is highly unlikely that
one query language will meet all needs. Therefore, we provide an
additional well-known query:
Ron Daniel [Page 14]
INTERNET-DRAFT An SGML-based URC Service June 7, 1995
o urn+qls
(e.g. <urn:x-dns-2:FQDN:urn+qls>)
Returns a URC that gives URNs and URLs for all the query languages
known by this server, their conditions of use, etc.
6.3 Random Notes on Querying
A notion that has been independently suggested by several parties
is the notion of using a partially completed URC as a template to
find full URCs that bear some relation to the template. Dealing
with the inheritance of attribute sets, specification of relation, and
server load issues is all future work for this specification, as is
additional work on the unique identification of query languages.
Received on Friday, 9 June 1995 08:58:55 UTC