URC spec 3/6

Ronald E. Daniel (rdaniel@acl.lanl.gov)
Fri, 9 Jun 1995 06:58:40 -0600

From: "Ronald E. Daniel" <rdaniel@acl.lanl.gov>
Date: Fri, 9 Jun 1995 06:58:40 -0600
Message-Id: <199506091258.GAA20125@idaknow.acl.lanl.gov>
To: uri@bunyip.com
Subject: URC spec 3/6

3 Attribute Sets

The primary purpose of  the URC service is  to resolve URNs into  URLs
for the  purpose of  resource retrieval.    However, the  URC makes  a
very convenient place  to store  metadata - data  about the  resource.

Ron Daniel                                                    [Page 5]

INTERNET-DRAFT          An SGML-based URC Service         June 7, 1995

Frequently this will  be bibliographic information,  but [1]  requires
that there be no restrictions  on the data that  can be placed in  the

The  URC  is  intended  to  be  a  container  for  metadata   about  a
wide variety  of  Internet  resources.     Satellite  images,   poems,
scientific datasets,  fine  art  images,  gene sequences,  ...     are
all reasonable candidates  for publication  in the Internet's  Uniform
Resource Architecture.     All  of  these  resource  types  will  need
different sorts of metadata.   Other  attributes, such as  ``subject''
may be used in different fashions.  Because of this diversity, we make
the fundamental assumption:

    There  are  no  metadata elements  (such  as  author,   title,
    subject, etc.)  that are applicable to all resources.

Because of  this  assumption,  we  need  a means  of  specifying  what
attributes are being used in a particular URC, as well as their syntax
and semantics.  This need brings up the notion of an attribute set and
the attribute set identifier.

    An Attribute Set (AS) is the particular collection of elements
    that may appear in a particular URC.
    An  Attribute  Set  Definition  (ASD)  is  a  machine-parsable
    specification of the elements in an attribute set.
    An Attribute  Set  Identifier  (AID)  is a  URN  that  can  be
    resolved to obtain the attribute set definition.

Using a URN to identify the attribute set of a URC has two advantages.
First, URNs  are  unambiguous,  so we  can  tell  if the  contents  of
one ``subject'' field  are comparable  to another.    Second, using  a
URN lets  us retrieve  the attribute  set definition  if  we need  to.
The definition is  a machine  parsable grammar  specification for  the
URCs.  This allows us  to parse novel URCs, although dealing  with the
semantics of novel elements is still  an unsolved problem.   A further
enhancement to  this model  is that  an AS  can be  a modification  of
an existing AS.  The child  AS would  specify only  the additions  and
changes to  the parent  AS. Thus,  attribute  sets can  form a  single
inheritance scheme back to  some presumably well-known base  attribute
set.  Multiple Inheritance  (MI) of attribute sets was  considered and
explicitly rejected for reasons of complexity, robustness, complexity,
poor behavior in  distributed systems, complexity,  lack of  universal
language support, and  complexity.   Furthermore, the author  believes
that MI is just too complex.  Dig?

The attribute set definition shall be an SGML DTD.  Parameter entities

Ron Daniel                                                    [Page 6]

INTERNET-DRAFT          An SGML-based URC Service         June 7, 1995

shall be  used to  allow element  definitions to  be  overridden in  a
single inheritance  scheme.     Such an  approach  is  illustrated  in
Appendices B and D.

The AS definition specifies  the syntax of  a URC in a  machine-usable
fashion.  There are three complications to this model.  First, we must
also provide a  specification of the  semantics of the  elements.   At
this time, we are unaware of any machine-usable semantic specification
schemes with the generality  needed for the URC  task.  Therefore,  we
rely on human-readable specification of the semantics of the elements.
The semantics of the elements in the attribute set shall  be indicated
by comments  in the  DTD.  Check w/  comp.text.sgml types  on  schemes
for automatically  extracting  comments  for  documentation  purposes.
Another mechanism is available for locating  machine-parsable semantic
definitions once they become  available.   But before we can  describe
that, we must talk about the other complications.

A second complication concerns the  URC syntax.  Having  the attribute
set defined  as an  SGML DTD  only allows  us  to automatically  parse
URCs that are conveyed in  an SGML transfer syntax.   Note that  other
syntaxes are explicitly allowed as a feature of this proposal.   Thus,
if a  request is  made for  a  text/plain syntax,  the  result is  not
parsable using  the AS  definition.    This is  not  a great  problem.
First, it is  easy enough to  request the URC  in a text/sgml  syntax,
which is required to be conformant with the AS DTD. Second,  we rarely
care about parsing according to ISO 8879.  Because the primary use for
the URC service is  URN to URL resolution,  we will usually parse  the
URC in a  heuristic fashion,  rather than retrieve  all the  inherited
DTD fragments.   The default AS  is provided to  simplify the task  of
heuristic parsing.

A third complication arises as  a result of using  a URN for the  AID.
Assume we have  retrieved a  URC, call  it URC-1,  that specifies  its
AID (AID-1).  Also assume  that we wish to retrieve the  attribute set
definition.   We resolve AID-1,  which is  a URN, and  get back a  URC
(URC-2) that lists locations for the  AS definition.  What is  the AID
in URC-2?  How do we avoid infinite regress?  This  standard defines a
basic meta-attribute set definition that is suitable for the URC of an
attribute set (see Appendix C).   To avoid infinite regress,  AIDs can
either be a URN, or the distinguished string "root".

Providing a URC for the AS  definition is a complication, but  it also
provides us with a  natural extension mechanism  for dealing with  the
semantics of an attribute set.   Just as a normal document  might have
ASCII and  PostScript representations,  the AS  definition might  have
SGML and KQML  representations.   These alternate representations  are
how we can provide versions of an AS definition  with machine-readable
semantic definitions.

Ron Daniel                                                    [Page 7]

INTERNET-DRAFT          An SGML-based URC Service         June 7, 1995