Re: Library Standards and URIs

Ronald E. Daniel (rdaniel@acl.lanl.gov)
Sat, 31 Dec 1994 17:40:54 -0700


From: "Ronald E. Daniel" <rdaniel@acl.lanl.gov>
Date: Sat, 31 Dec 1994 17:40:54 -0700
Message-Id: <199501010040.RAA00574@collie.acl.lanl.gov>
To: hoymand@gate.net
Subject: Re: Library Standards and URIs
Cc: michael.mealling@oit.gatech.edu, uri@bunyip.com

Michael Mealling said:
> >I can deal with this. Are you suggesting then that a URC is defined
> >as this limited set? If so I would like to see a level attribute to
> >the <urc> tag so that we can extend the complexity of URCs.

Dirk Hoyman said:
> I offer this as an example of how a URC with SGML tagging would look, not
> as a specification.  My goal at this point is to get the idea of using SGML
> on the table for discussion.  If we wish to pursue this, I would think some
> sub-group, of which I would be happy to volunteer for, would work on an
> acceptable tag set.

I would also like to work on that task. Before we do that, we should decide
on the scale of the project. It will be impossible to satisfy everyone. Some
people will place a higher priority on keeping it simple, others will want a
very expressive URC.

Fortunately, I think that the requirement that we be able to put just about
ANYTHING into the URC points us to a possible solution. If people can add
their own attributes to their URCs (which I think is good), we are going to
see name clashes (which is bad). We also have the problem of knowing how to
interpret these new attributes, after all, someone else might want to
utilize them. To overcome these problems, I suggest that non-standard
attributes carry along the URN of a human-readable explanation of their
purpose, semantics, and syntax. For a common example, we might have 2
different "subject" entries in the URC for some resource. The first might
identify LC subject headings, the second would be an encoded trigram
vector:

  <urc>
  ...
  <subject scheme="urn:iana:lc.gov:subj-headings-1995">computing, history of
  </subject>
  <subject scheme="urn:iana:c-3.lanl.gov:trigram-scheme-4">
  JFJ438RJFU4RJFRU4N;OIGH3P48HSJNVEORYHG3O4HFJJGH3PGHJNVERIGTHEINVOR
  NRGH8YWJGNP5G;RJGHEWPO5TGJJNERUGH4WIGHRHGF43GFRKBVEIUGH3LUGBERUGHR
  ...
  3POIHFP4YJRBVI4IGHFLRF437HSRGHEWIYGFELRFBH57GHRF47GHSLSGRFRYGFEEWS
  </subject>
  ...
  </urc>

Some people might suggest that the URN should point to a machine-readable
description of the semantics instead of a human-readable. That would be nice,
except that I don't think there are any existing schemes for describing
ARBITRARY semantics. We could allow for another, optional, URN to go into an
attribute so that the people who want to experiment with machine-readable
descriptions could do so.

As for the battle over the core elements, we could push the scheme described
above to its extreme and say "there are NO core elements, we require ALL
attributes to have the URN of their explanation". I think that is a bad
idea for a couple of reasons - chaos and length. Chaos would occur when we
150,000 URNs for "author", "title", etc. Length is pretty obvious - would
you rather type <author> or <author scheme="urn:iana:here.there.org:foo">?

In talking with Larry Masinter about this in San Jose, he suggested that
we put the URN of the "attribute set" being used at the top of the URC.
This is similar in some ways to Michael Mealling's request that we put a
level number in the URC so we could change things as we went along.
However, with the URN scheme, attributes that do not have an explicit URN
inherit the URN for the core set. Those with an explicit URN use it to
override the core set's semantics if there is a collision. The advantages
and disadvantages of the two approaches seem to be that a simple level
number minimizes chaos, using a URN is a more general solution.

Initially I favored a simple level number, such as <urc 1>. The greater
generality of Larry's suggestion is very appealing. Of course, saying
<urc attrset="urn:iana:here.there.org:foo"> is very similar to
specifying a DTD - although I would hope the URN names a resource that
is more readable to the general user population than a DTD!

The elements we have so far - <author>, <title>, <data>, <extent>,
<locationGroup>, <list>, and <item> are a good start. I would certainly add
<subject> and <signature>. We can discuss others soon. Before we go on to
decide on what attributes go into the different levels, I would like
to know the group's feelings on the question:

   Should we use a simple indication of level or use a URN to
   identify the attribute set in a URC?


Ron