- From: Terry Winograd <winograd@cs.stanford.edu>
- Date: Wed, 4 Jan 1995 13:58:02 -0800
- To: "Ronald E. Daniel" <rdaniel@acl.lanl.gov>, hoymand@gate.net
- Cc: michael.mealling@oit.gatech.edu, uri@bunyip.com
At 5:40 PM 12/31/94, Ronald E. Daniel wrote: >Fortunately, I think that the requirement that we be able to put just about >ANYTHING into the URC points us to a possible solution. If people can add >their own attributes to their URCs (which I think is good), we are going to >see name clashes (which is bad). We also have the problem of knowing how to >interpret these new attributes, after all, someone else might want to >utilize them. To overcome these problems, I suggest that non-standard >attributes carry along the URN of a human-readable explanation of their >purpose, semantics, and syntax ... >In talking with Larry Masinter about this in San Jose, he suggested that >we put the URN of the "attribute set" being used at the top of the URC. Coming into this discussion from a programming-language background rather than a library background, I have the feeling that we are slowly moving towards something that is already standard in the programming domain -- extensible collections of class or record definitions. They provide a general way to deal with the inevitable tension between the desire for simplicity and predictability (wired-in fixed schemes that everyone uses) and desire for flexibility (add whatever you need on the fly). For something as simple as mail-headers, it has worked relatively well to simply merge these -- a fixed set of attributes defined in the RFC that everyone uses and an "X..." set that has no regularity at all -- you just have to hope that the program reading an attribute does something related to the intention of the one that wrote it. In a class-based system (which can, but need not be, hierarchical with inheritance) you have a two-level structure -- a class name specifies the kind of description and the attributes are specific to the class. I have worked with systems that allow a single object to have multiple simultaneous class assignments (multiple descriptors), so you might end up with something like the following for a single document: [URC: Class Corebib <url:http//mysite.net/corebibdef> Class Rated <url:http//yoursite.com/ratingsdef> Class MARC <urn:loc.uri/official/marcdef> [COREBIB [urn mysite.uri/myauth/11122233] [title My really good resource] [author Ima Nutt] [date December 22, 1994] [locations ( [url http://www.mysite.com/myresource] [extent 24567 bytes] [format text/html] ( [url ftp://ftp.mysite.com/pub/myresource.txt] [extent 12543 bytes] [format text/plain] )] ] [RATED [violence 8.a.G ] [sex 0.ssd.Y] [language 4.woi.L] ] [MARC [040$c_Transcribing_Institution Stanford University] [780-3_Supersedes_In_Part urn:mysite.net/old/resource.txt] ] ] I have not used SGML syntax here, although there is an obvious translation, because of the DTD problem -- since the class definitions (which is an open-ended extensible collection) each have their own attributes, there is no convenient way to map attribute names to tag names in a DTD -- we can't have one huge DTD for everyone's classes, but you don't want a mix-and-match DTD specific to every document. From my point of view, this is a strike against using SGML (as opposed to some other variant of nested, tagged syntax which doesn't employ the same definition mechanisms). The example uses three schemes, one for simple core bibliographies, one for ratings, and one for MARC compatibility. The syntax allows the part for each section to contain ANY NUMBER of the attributes for its class. In cases where it is necessary to have all attributes, that would be checked as a semantic/pragmatic issue, not part of the basic syntax. The class definitions themselves have URNs (and URLs, etc.) so they can be looked up by programs that use them. I am being a little pessimistic about URN-availability, assuming the schema writer can specify any kind of effective locator (in the above example there are two URLs and a URN). Anyone can add arbitrary attributes, by providing a class definition on the net. This could be a class with only one attribute, but more likely would have several related ones. This is akin to your proposal that: >I suggest that non-standard >attributes carry along the URN of a human-readable explanation of their >purpose, semantics, and syntax." The proposal here is to make that notion fully general -- that every attribute is identified when used as belonging to a class, and every class has a definitions file on the net -- all the way from the widespread standards (e.g., core biblio, MARC, etc.) to idosyncratic onesies ("X-phase-of-the-sun"). As you point out, there needs to be further discussion of what goes into these files and how much is machine-readable or human-readable. All this leads to a somewhat more complex system than traditional header/attribute models (although probably not more complex than the SGML/DTD mechanism). It's a design tradeoff, but my sense is that we are now at a stage where people making use of these tools can move up a level of sophistication from the raw PERL script munging of strings. I agree with the importance of keeping things implementable, but if we go for the lowest common demonimator we may lose capacities that will be really important down the line. --t -------------------------------------------- Terry Winograd, Department of Computer Science Stanford University Stanford, CA 94305-2140 Email: winograd@cs.stanford.edu Phone: 415/723-2780 Fax: 415/725-7411 WWW: http://www-pcd.stanford.edu/winograd
Received on Wednesday, 4 January 1995 16:57:16 UTC