- From: Terry Allen <terry@ora.com>
- Date: Sun, 22 Jan 1995 12:41:10 PST
- To: davenport@ora.com, uri@bunyip.com, hackers@ora.com
Proposal for Davenport Group work on URCs Terry Allen 22 January 1995 INTRODUCTION URCs, or Uniform Resource Characteristics, are being discussed, inter alia, as a means of supplying bibliographic metadata about online documents. At the Davenport Group meeting of 17--19 January 1995, I proposed that the group attempt to construct a trial URC resolution service for online computer documentation (the area of interest of the Davenport Group). The Davenport sponsors were quite interested in the proposal; this document is an expansion of what I said at the meeting, outlining my thoughts on what is needed for such a service. Comments are more than welcome. (And thanks to Ron Daniel and Roy Fielding for helpful correspondance on UR issues; neither of them is responsible for this proposal, which is not intended as an RFC, at least not at this stage.) This proposal is intended primarily to establish a format for the metadata and secondarily to determine what other pieces are required to make an URC resolution service work. The aims stated are deliberately circumscribed so as to avoid several large issues not directly related to the metadata format, and no attempt is made to generalize that format beyond computer documentation (which for the purpose at hand I assume to be in SGML, though it doesn't really matter). As Ron Daniel has put forward a concrete proposal for URCs http://www.acl.lanl.gov/URI/SGML/overview.html that envisions them as sets of information, I use the term "URC set." The format for the metadata is SGML, but I am not advocating the use of SGML for this purpose in the general case---it just so happens that for Davenport, SGML apparently will work. I believe the following pieces are needed: Engine to generate a correct URC set from the Bookinfo in a Docbook DTD-encoded document DTD for the URC set Engine to store URC sets, concatenate them, and permit authorized revision of them (the info need not be stored in SGML; this is a separate layer [or layers?] that could be implemented differently by different services) Server site(s) for public URC resolution (the issue of authorized revision can be avoided, from the standpoint of this project, if each publisher maintains its own site, although I don't expect that to be the eventual general case) Local URC resolution service (check to see whether the target document is available on the local system) Very simple URL(?) format to return answers to only 2 types of queries: 1) given a URN, return all URLs; 2) given URC for TITLE, return all URLs (anything more demands that we decide upon a query language, which is a contentious matter that is not the point of this exercise). Some machinery for the browser to choose a URL from among those returned Then it has to be wrapped up and made to work so that I can write a link in my Docbook document (Docbook has a Ulink element to hold URLs) like this, for a URN: ... blarty foo <ulink url="the.urn.goes.here">Windows 3.1 User's Guide</ulink> blarts more blarts or this, for a URC title query: ... blarty foo <ulink url="the.urc.for.title.goes.here">Windows 3.1 User's Guide</ulink> blarts more blarts and when the user clicks on the hot spot, the intended document is fetched and displayed from the local installation or from the Internet, assuming the user is connected to it. It may be desirable to extend <ulink> with attributes additional to the present URL attribute. The browser has to transmit the URN or URC to the local URC resolution service, then if need be to the publisher's URC resolution service site, and upon receipt of the response, to invoke the "some machinery" to pick a URL and fetch it. The resolution service needs to parse the complete URC set, or consult a preparsed table, to return the appropriate info. FORMAT of the URC INFORMATION As the documents in question are electronic books, it seems appropriate to use either a TEI header or the USMARC format to represent the bibliographic metadata, as both formats are well worked out. I chose TEI because I think it will be easier for Davenporters to use and I didn't want to learn USMARC rules. Here's a sample set of information marked up in strict accordance with the TEI P3 DTD. It may be desirable to define a subset of this DTD for Davenport purposes; I am still exploring the possibilities offered by the TEI header; see recent posts to TEI-L. This set has a bit more info than is strictly needed. <!doctype teiheader system "tei2.dtd"[ <!ENTITY % TEI.mixed 'INCLUDE' > <!ENTITY % TEI.names.dates 'INCLUDE' > <!entity % isonum system "iso-num.gml"> %isonum; ]> <!-- need isonum for the ampersand in O'Reilly and Asso --> <teiheader> <!-- as for most elements, the attributes of teiheader are not really needed for an elementary URC --> <filedesc> <titlestmt> <title>X Window System User's Guide: electronic edition</> <!-- TEI recommends that you distinguish the titles of print works and electronic versions in this fashion, using one of two set phrases, the other one being "a machine readable transcription" --> <author>Valerie Quercia</> <author>Tim O'Reilly</> </> <editionstmt> <edition>OSF/Motif 1.2 Edition</> </editionstmt> <publicationstmt> <publisher>O'Reilly & Associates, Inc.</> <idno type=ISBN>12345678-9</> <!-- ISBN of the electronic edition, not of the print book --> <date>1 April 1994</> </publicationstmt> <seriesstmt> <title>X Window System</> <idno type=vol>3</> </seriesstmt> <sourcedesc> <p>written as an etext </> </sourcedesc> </filedesc> <encodingdesc> <classdecl> <taxonomy id=LCSH> <bibl>Library of Congress Subject Headings </bibl> </taxonomy> </classdecl> </encodingdesc> <profiledesc> <textclass> <keywords scheme=LCSH> <list> <item>Computer software documentation</> <item>Computer software configuration management</> </list> </keywords> </textclass> </profiledesc> </teiheader> THE TEIHEADER IN AN URC SET Here's a sample document, with DTD, that wraps the above TEI header along with URNs and URLs into one large element I called URC. The content model of URC is arranged simply for convenience in the present trial, and should be regarded pretty much as a placeholder or strawman. I use <urc.etc> to represent all the other flavors of URC sets that might exist. The prologue includes the DTD and the pieces from the prologue of the sample TEI header shown above; the included entity would be the <teiheader>...</teiheader> part of the sample above. <!doctype urc [ <!element urc - - (urc.tei.davenport*, urc.etc*)> <!element urc.etc - - (#pcdata) -- placeholder -- > <!element urc.tei.davenport - - (teiheader, ((URN+, URL*) | URL+)) -- at least one URN or at least one URL, per comments at Davenport meeting, and thanks to Eve Maler -- > <!element (urn|url) - - (#PCDATA)> <!entity % isonum system "iso-num.gml"> %isonum; <!ENTITY % TEI.general 'INCLUDE' > <!ENTITY % TEI.names.dates 'INCLUDE' > <!entity % teidtd system "tei2.dtd"> %teidtd; <!entity teix system "tei.exmpl.v3m"> <!-- teix is the teiheader example shorn of its doctype decl --> ]> <urc> <urc.tei.davenport> &teix; <urn>Very.Fine.Example </urn> <url>http://com.com.com/very.fine </url> <url>http://edu.edu.edu/v.fine.example </url> </urc.tei.davenport> </urc> OPEN ISSUES Practically everything is open, but here's a short list. Is the list of pieces given above complete? correctly divided into components? What should be the syntax of the URL attribute values for URN and URC/title? should Docbook's Ulink be extended with additional attributes? Can the "local resolution service" for URN>URL resolution be so simple as an SGML entity catalogue in the style set up by SGML Open? can the local service for simple URC/title queries be specified so that it could be implemented *as a layer distinct from the URC set and document encoding* in Hytime by those interested in doing so? Is LCSH an appropriate choice for a keyword thesaurus (beyond the scope of the project, really, but something to be thinking about)? If one wishes to establish URNs for sections and subsections of a document, how should they be nested, if at all, in the overall URC set? Who would be interested in helping with some of the other pieces? There's no money in this project, at least at this stage of development, and maybe not any glory, either. -- Terry Allen (terry@ora.com) O'Reilly & Associates, Inc. Editor, Digital Media Group 101 Morris St. Sebastopol, Calif., 95472 A Davenport Group sponsor. For information on the Davenport Group see ftp://ftp.ora.com/pub/davenport/README.html or http://www.ora.com/davenport/README.html
Received on Sunday, 22 January 1995 17:25:04 UTC