URC spec 4/6

Ronald E. Daniel (rdaniel@acl.lanl.gov)
Fri, 9 Jun 1995 06:58:54 -0600


From: "Ronald E. Daniel" <rdaniel@acl.lanl.gov>
Date: Fri, 9 Jun 1995 06:58:54 -0600
Message-Id: <199506091258.GAA20129@idaknow.acl.lanl.gov>
To: uri@bunyip.com
Subject: URC spec 4/6


4 Default Attribute Set


The use of inherited  attribute set grammars  provides a very  general
capability, and there are applications that can use that generality to
improve their functionality  and scalability.    However, the  primary
purpose of the URC service is URN to URL resolution, and  the speed of
that resolution is  a major  concern for  the URC service.    Response
times for interactive browsing do not allow multiple  network accesses
in order to fetch DTD fragments and build a grammar for parsing a URC.
Therefore, we provide a default  attribute set whose semantics  are to
be broadly understood.    If a URC  comes in  that has been  described
using this attribute  set, then it  can be parsed  in either a  formal
(according to ISO 8879)  or a heuristic (anything  else) fashion.   If
a URC  comes in  with no  attribute set  specified,  then the  default
attribute set is  assumed.   Furthermore,  if a  URC comes  in with  a
different attribute  set, we  mandate that  any elements  it has  with
the same names  as elements  in the  default attribute  set must  have
similar semantics  to the  elements of  the default  set.   This  will
allow heuristic parsing of new attribute  sets.  To amplify  this last
point, consider the TITLE  element.   Anyone creating a new  attribute
set that contains TITLE must  use it in a  fashion similar to that  in
the default attribute set.  If  someone wants to use TITLE for  a very
different purpose (describing royalty perhaps), they must  add special
information (scheme, type, or other attributes) so that it is possible
to tell that TITLE is being used in a novel fashion and simple parsers
are not led astray.

For the  purposes  of  this  draft  of  the  specification,   and  the
prototypes develop0ed from it, the  AID for the default  attribute set
shall be:

<urn:x-dns-2:uri.acl.lanl.gov:default-dtd>

It is anticipated that  as this specification  moves to the  standards
track, IANA will provide the URN for the default attribute set.

The default attribute set is strongly based on the work  of the Dublin
metadata workshop  [4].    A  brief overview  is  given  below.    The
attribute set is based on a few  principles.  First, we provide  a set
of elements with widely understood semantics (such as Title).  Second,
we provide  a mechanism  for specifying  more precise  interpretations
of those elements  (such as  Subject (scheme=LCSH) Computer  Science).
Third, we allow  a variety  of transfer syntaxes  for the  attributes.
Plain text,  HTML,  binary  encodings, etc.     can all  be  used,  as
described further  in section  5.   All  the elements  in the  default
attribute set are  optional and repeatable.    No particular order  is
required.



Identifier: String or  number used to  uniquely identify this  object.

Ron Daniel                                                    [Page 8]


INTERNET-DRAFT          An SGML-based URC Service         June 7, 1995

    Typically  this  will  be  the  URN  for  the  resource.     Other
    identifiers may also be included.

Instance: Instance  is   a  construct  for  grouping  information   on
    particular instances  of a  resource,  such as  location,  format,
    price, etc.

URL: The URL  element contains a URL that  can be used to  retrieve an
    instance of the resource.

Author: The  person(s)  and/or organization(s)  primarily  responsible
    for the intellectual content of the work.

Title: The name of the object.

Subject: The field of knowledge to which the work belongs.

Publisher: The  agent  or agency  responsible  for making  the  object
    available.

Date: The date of publication.

Other Agent: Other person(s) and/or organization(s),  such as editors,
    transcribers,  sponsors,   etc.      who  have  made   significant
    contributions to the work.  Author and Publisher are special cases
    of OtherAgent.

Object type: The  abstract category  of  the  object, such  as  novel,
    poem, dictionary.

Form: The  particular  manifestation  or data  representation  of  the
    object, such as PostScript file or Windows executable.   For URCs,
    form will  typically be  specified  as an  Internet Media  Type  -
    formerly known as the MIME Content-type.

Relation: Relationship  to  other  objects.      This  element  should
    identify the  role of  the relationship,  as well  as the  related
    objects.

Source: Objects, either print or electronic,  from which this resource
    was derived.  This is a special case of the  Relation element that
    is believed to be widely useful to the humanities.

Language: Natural language of the intellectual content.

Coverage: The spatial locations and  temporal durations characteristic
    of the object.






Ron Daniel                                                    [Page 9]


INTERNET-DRAFT          An SGML-based URC Service         June 7, 1995

5 Multiple Syntaxes


Several of  the URC  requirements are  difficult,  if not  impossible,
to satisfy at  the same time.   For  example, it  is a requirement  to
have a human-readable,  printed representation  of a URC.  It is  also
a requirement that URCs  have a consistent  encoding that is  suitable
for  digital  signature  computation.     Unfortunately,   end-of-line
convention differences  between platforms  make it  difficult to  meet
both requirements simultaneously.    A related problem  is that  there
will be different  uses for URC  information, and different  encodings
will be appropriate to  meet those different needs.   For example,  it
might be useful to obtain  citations in some format (Bib  TeX perhaps)
for inclusion in another system.

Because  of  these   considerations,  we   make  another   fundamental
assumption:


    There is no universally applicable syntax for a URC.


Because of this assumption,  this specification calls  for the use  of
format negotiation  mechanisms, specifically  the Accept:   header  in
HTTP [?], to  indicate preferred syntaxes.    Depending on the  user's
intent for the URC, they  can ask for an  HTML encoding, a plain  text
encoding, an encrypted binary encoding, a synthesized audio rendition,
etc.

A set of examples  are provided that  show the requests and  responses
for different renditions of the  same underlying information.   All of
these examples assume we are resolving the URN:


    urn:dns:pchs.k-12.okc.ok.us:student-papers-1995/geo3



5.1 Example 1:  text/html


When the URC service is first deployed, there will be a  large base of
existing web browsers that  will not have  the ability to parse  URCs.
One means of remaining  compatible with these  browsers is to  request
the URC in  HTML. The user  can look at  information on the  different
locations for a resource, pick a site, and click on a link to retrieve
a resource.

The URN resolution request sent to the URC server's HTTP  daemon might
be:



Ron Daniel                                                   [Page 10]


INTERNET-DRAFT          An SGML-based URC Service         June 7, 1995

GET student-papers-1995/geo3 HTTP/1.0
Accept: text/html, */*; q=0.2


which says to send text/html if possible, anything else if not.

The reply from the server might be:


HTTP/1.0 200 OK
Date: Tuesday, 08-Oct-96 21:09:16 GMT
Server: Apache/2.7
MIME-version: 1.0
Content-type: text/html
Last-modified: Friday, 26-Apr-96 21:57:12 GMT
Content-length: 129

<html>
<head>
<title>URC for urn:x-dns-2:pchs.k-12.okc.ok.us:student-papers-1995/geo3
</title>
</head>
<body>
<h1>urn:x-dns-2:pchs.k-12.okc.ok.us:student-papers-1995/geo3
</h1>
Author: Smith, Fred<br>
Title:  Wanker!  :   A  vicious,   seditious,  and  tendentious   his-
tory of George III<br>
Subject: American Revolution
Subject: (In)famous crackpots of history
Location: <A HREF=
"http://www.pchs.k-12.okc.ok.us/student-papers/1995/smith/geo3.html">
http://www.pchs.k-12.okc.ok.us/student-papers/1995/smith/geo3.html</a><br>
Form:text/html<br>
</body>
</html>




5.2 Example 2:  text/urc0


The  text/html  example   above  provides   one  means  for   backward
compatibility with legacy clients.   However, the "click  twice" model
is going to get old  really quickly.   A CCI helper application  could
be constructed that would parse URCs  in some trivial format,  such as
text/urc0 [3], and automatically pick a URL.


GET student-papers-1995/geo3 HTTP/1.0
Accept: text/urc0, */*; q=0.2

Ron Daniel                                                   [Page 11]


INTERNET-DRAFT          An SGML-based URC Service         June 7, 1995

HTTP/1.0 200 OK
Date: Tuesday, 08-Oct-96 21:09:16 GMT
Server: Apache/2.7
MIME-version: 1.0
Content-type: text/urc0
Last-modified: Friday, 26-Apr-96 21:57:12 GMT
Content-length: 62

=====
http://www.pchs.k-12.okc.ok.us/student-papers/1995/smith/geo3.html



5.3 Example 3:  text/sgml


This example  assumes the  URC is  prepared according  to the  default
attribute set.  Note that the document declaration is  included in the
response, and that a URN  is used in the  SYSTEM identifier.  This  is
how the attribute  set is  indicated for  the SGML  syntax.   This  is
somewhat contrary to  the intent of  SGML, which  wants to use  SYSTEM
to locate  local  information,  and PUBLIC  for  well-known,  publicly
accessible information.   I have gone  against that convention,  since
SGML PUBLIC identifiers have a restricted character set,  perform some
processing on  the characters,  and  encourage a  syntax  (known as  a
formal public identifier) that has structure likely to be incompatible
with URNs.    By  using a  SYSTEM  identifier we  get a  more  liberal
character set, and the string  is handed off to an  ``entity manager''
for processing so as to fetch the appropriate file.  Using  a URN here
will require an enhanced entity  manager, but such things  are already
part of some commercial SGML products.

The URN resolution request sent to the URC server's HTTP  daemon might
be:


GET student-papers-1995/geo3 HTTP/1.0
Accept: text/sgml, */*; q=0.2


The reply from the server might be:


HTTP/1.0 200 OK
Date: Tuesday, 08-Oct-96 21:09:16 GMT
Server: Apache/2.7
MIME-version: 1.0
Content-type: text/sgml
Last-modified: Friday, 26-Apr-96 21:57:12 GMT
Content-length: 129

<!DOCTYPE URC SYSTEM "urn:x-dns-2:uri.acl.lanl.gov:default-1-dtd">

Ron Daniel                                                   [Page 12]


INTERNET-DRAFT          An SGML-based URC Service         June 7, 1995

<urc>
<identifier scheme="URN">
urn:dns:pchs.k-12.okc.ok.us:student-papers-1995/geo3
</identifier>
<author>
Smith, Fred
</author>
<title>
Wanker! : A vicious, seditious, and tendentious history of George III
</title>
<Subject>American Revolution</subject>
<Subject>(In)famous crackpots of history</subject>
<instance>
<URL>
http://www.pchs.k-12.okc.ok.us/student-papers/1995/smith/geo3.html
</URL>
<form scheme="IMT">text/html</form>
</instance>
</URC>




The examples above  have suggested particular  syntaxes.   Determining
the set of ``well-known'' syntaxes  that all URC servers must  be able
to emit is one direction this standard could be extended.


6 Query Languages


Diversity is the hallmark of  the URC service.   We want to  encourage
the formation of a variety of value-added services, therefore  we need
a standard means of  dealing with unique  capabilities.  For  example,
different search services will have different information  and support
different forms of  queries.   This  section defines  a trivial  query
language that all  URC servers and  clients must  support in order  to
claim conformance with the standard.  This section also  describes the
scheme by which non-standard queries can be identified.


6.1 Trivial Query Language


All queries in the trivial query language are HTTP GET requests.   The
simplest query  is to  ask  for the  URC of  a  resource by  providing
its URN. Note  that the  HTTP spec  [?]   says that  the protocol  and
hostname parts of a URL are assumed to be known.  This is  not a valid
assumption for URNs.  URN. The server will return the full URC for the
resource, in whatever format has been negotiated.



Ron Daniel                                                   [Page 13]


INTERNET-DRAFT          An SGML-based URC Service         June 7, 1995

A few other queries are defined to obtain information on the resolver,
resources, etc.  These are also HTTP GET requests.   Where the request
is to get server  information, the FQDN shall  be that of the  server.
They all begin  with the  reserved string  "urn+".   The requests  and
their meaning are:



 o  urn+m Resolver meta-information
    (e.g.  <urn:x-dns-2:FQDN:urn+m>)
    Returns a  URC  indicating  URLs where  resolver  metainformation,
    such  as  the  administrative contact,   sponsoring  organization,
    publication policy, etc.  can be retrieved.

 o  urn+a list of All RequestIDs
    (e.g.  <urn:x-dns-2:FQDN:urn+a>)
    Returns a URC indicating URLs where a list of all URNs provided by
    the publisher can be obtained.  The request must  be understood by
    all resolvers, but the response may have zero URLs in it.  In such
    a case the URC should  contain a message saying the  equivalent of
    ``sorry''.

 o  urn+c Child naming authorities
    (e.g.  <urn:x-dns-2:FQDN:urn+c>)
    Returns a URC  indicating URLs where  a list  of any child  naming
    authorities licensed by the  naming authority associated with  the
    FQDN can be found.

 o  urn+p Parent naming authorities
    (e.g.  <urn:x-dns-2:FQDN:urn+p>)
    Returns a URC indicating  URLs where a  list of any parent  naming
    authorities licensing  the naming  authority  associated with  the
    FQDN can be found.

 o  urn+aids
    (e.g.  <urn:x-dns-2:FQDN:urn+aids>)
    Returns a URC indicating the URLs that can be used to fetch a list
    of all the  attribute sets  used to describe  information on  this
    server.


6.2 Query Language Identification


This relatively minimal set of queries does not allow the construction
of complex  queries,  such  as  "gimme  the URNs  and  titles  of  all
resources with  'Smith'  as an  author  and  'food' as  the  subject".
Different  organizations  will  wish  to   add  value  to  their   URC
collections in  different  fashions, so  it  is highly  unlikely  that
one query language  will meet  all needs.   Therefore,  we provide  an
additional well-known query:


Ron Daniel                                                   [Page 14]


INTERNET-DRAFT          An SGML-based URC Service         June 7, 1995

 o  urn+qls
    (e.g.  <urn:x-dns-2:FQDN:urn+qls>)
    Returns a URC that gives URNs and URLs for all the query languages
    known by this server, their conditions of use, etc.



6.3 Random Notes on Querying


A notion  that has  been independently  suggested  by several  parties
is the notion  of using  a partially  completed URC as  a template  to
find full  URCs that  bear some  relation to  the template.    Dealing
with the inheritance of attribute sets, specification of relation, and
server load issues is  all future work  for this specification, as  is
additional work on the unique identification of query languages.