Re: Library Standards and URIs from Terry Winograd on 1995-01-04 (uri@w3.org from January 1995)

From: Terry Winograd <winograd@cs.stanford.edu>
Date: Wed, 4 Jan 1995 13:58:02 -0800
To: "Ronald E. Daniel" <rdaniel@acl.lanl.gov>, hoymand@gate.net
Cc: michael.mealling@oit.gatech.edu, uri@bunyip.com
Message-Id: <v03000a12ab30b9594bc0@[192.203.7.234]>
At 5:40 PM 12/31/94, Ronald E. Daniel wrote:
>Fortunately, I think that the requirement that we be able to put just about
>ANYTHING into the URC points us to a possible solution. If people can add
>their own attributes to their URCs (which I think is good), we are going to
>see name clashes (which is bad). We also have the problem of knowing how to
>interpret these new attributes, after all, someone else might want to
>utilize them. To overcome these problems, I suggest that non-standard
>attributes carry along the URN of a human-readable explanation of their
>purpose, semantics, and syntax
...
>In talking with Larry Masinter about this in San Jose, he suggested that
>we put the URN of the "attribute set" being used at the top of the URC.

Coming into this discussion from a programming-language background rather
than a library background, I have the feeling that we are slowly moving
towards something that is already standard in the programming domain --
extensible collections of class or record definitions.  They provide a
general way to deal with the inevitable tension between the desire for
simplicity and predictability (wired-in fixed schemes that everyone uses)
and desire for flexibility (add whatever you need on the fly).  For
something as simple as mail-headers, it has worked relatively well to
simply merge these -- a fixed set of attributes defined in the RFC that
everyone uses and an "X..." set that has no regularity at all -- you just
have to hope that the program reading an attribute does something related
to the intention of the one that wrote it.

In a class-based system (which can, but need not be, hierarchical with
inheritance) you have a two-level structure -- a class name specifies the
kind of description and the attributes are specific to the class.  I have
worked with systems that allow a single object to have multiple
simultaneous class assignments (multiple descriptors), so you might end up
with something like the following for a single document:


[URC:
  Class  Corebib <url:http//mysite.net/corebibdef>
  Class Rated <url:http//yoursite.com/ratingsdef>
  Class MARC <urn:loc.uri/official/marcdef>

   [COREBIB
    [urn   mysite.uri/myauth/11122233]
    [title   My really good resource]
    [author   Ima Nutt]
    [date   December 22, 1994]
    [locations
      ( [url   http://www.mysite.com/myresource]
        [extent 24567 bytes]
        [format text/html]
      ( [url   ftp://ftp.mysite.com/pub/myresource.txt]
        [extent 12543 bytes]
        [format text/plain]    )]   ]

  [RATED
    [violence 8.a.G ]
    [sex  0.ssd.Y]
    [language 4.woi.L]   ]

  [MARC
    [040$c_Transcribing_Institution  Stanford University]
    [780-3_Supersedes_In_Part urn:mysite.net/old/resource.txt] ] ]

I have not used SGML syntax here, although there is an obvious translation,
because of the DTD problem -- since the class definitions (which is an
open-ended extensible collection) each have their own attributes, there is
no convenient way to map attribute names to tag names in a DTD -- we can't
have one huge DTD for everyone's classes, but you don't want a
mix-and-match DTD specific to every document.   From my point of view, this
is a strike against using SGML (as opposed to some other variant of
nested, tagged syntax which doesn't employ the same definition mechanisms).

The example uses three schemes, one for simple core bibliographies, one for
ratings, and one for MARC compatibility.  The syntax allows the part for
each section to contain ANY NUMBER of the attributes for its class.  In
cases where it is necessary to have all attributes, that would be checked
as a semantic/pragmatic issue, not part of the basic syntax.  The class
definitions themselves  have URNs (and URLs, etc.) so they can be looked up
by programs that use them.  I am being a little pessimistic about
URN-availability, assuming the schema writer can specify any kind of
effective locator (in the above example there are two URLs and a URN).

Anyone can add arbitrary attributes, by providing a class definition on the
net.  This could be a class with only one attribute, but more likely would
have several related ones. This is akin to your proposal that:

>I suggest that non-standard
>attributes carry along the URN of a human-readable explanation of their
>purpose, semantics, and syntax."

The proposal here is to make that notion fully general -- that every
attribute is identified when used as belonging to a class, and every class
has a definitions file on the net -- all the way from the widespread
standards (e.g., core biblio, MARC, etc.) to idosyncratic onesies
("X-phase-of-the-sun").   As you point out, there needs to be further
discussion of what goes into these files and how much is machine-readable
or human-readable.

All this leads to a somewhat more complex system than traditional
header/attribute models (although probably not more complex than the
SGML/DTD mechanism).  It's a design tradeoff, but my sense is that we are
now at a stage where people making use of these tools can move up a level
of sophistication from the raw PERL script munging of strings.  I agree
with the importance of keeping things implementable, but if we go for the
lowest common demonimator we may lose capacities that will be really
important down the line.

--t


--------------------------------------------
Terry Winograd, Department of Computer Science
Stanford University Stanford, CA 94305-2140
Email: winograd@cs.stanford.edu
Phone: 415/723-2780
Fax: 415/725-7411
WWW: http://www-pcd.stanford.edu/winograd
Received on Wednesday, 4 January 1995 16:57:16 UTC