URC proposal for Davenport Group

Terry Allen (terry@ora.com)
Sun, 22 Jan 1995 12:41:10 PST


Message-Id: <199501222041.MAA02978@rock>
From: Terry Allen <terry@ora.com>
Date: Sun, 22 Jan 1995 12:41:10 PST
To: davenport@ora.com, uri@bunyip.com, hackers@ora.com
Subject: URC proposal for Davenport Group

Proposal for Davenport Group work on URCs
Terry Allen     22 January 1995


INTRODUCTION

URCs, or Uniform Resource Characteristics, are being 
discussed, inter alia, as a means of supplying bibliographic 
metadata about online documents.  At the Davenport Group
meeting of 17--19 January 1995, I proposed that the group
attempt to construct a trial URC resolution service for
online computer documentation (the area of interest of
the Davenport Group).  The Davenport sponsors were quite
interested in the proposal; this document is an expansion
of what I said at the meeting, outlining my thoughts on
what is needed for such a service.  Comments are more than
welcome.  (And thanks to Ron Daniel and Roy Fielding for
helpful correspondance on UR issues; neither of them is
responsible for this proposal, which is not intended as an
RFC, at least not at this stage.)

This proposal is intended primarily to establish a format
for the metadata and secondarily to determine what other
pieces are required to make an URC resolution service work.
The aims stated are deliberately circumscribed so as to
avoid several large issues not directly related to the
metadata format, and no attempt is made to generalize
that format beyond computer documentation (which for the
purpose at hand I assume to be in SGML, though it doesn't
really matter).  

As Ron Daniel has put forward a concrete proposal for URCs
	http://www.acl.lanl.gov/URI/SGML/overview.html
that envisions them as sets of information, I use the term
"URC set."  The format for the metadata is SGML, but I am
not advocating the use of SGML for this purpose in the 
general case---it just so happens that for Davenport,
SGML apparently will work.  
 
I believe the following pieces are needed:

Engine to generate a correct URC set from the Bookinfo in a
	Docbook DTD-encoded document 

DTD for the URC set

Engine to store URC sets, concatenate them, and permit authorized
	revision of them (the info need not be stored in SGML; this
	is a separate layer [or layers?] that could be implemented 
	differently by different services)

Server site(s) for public URC resolution (the issue of authorized
	revision can be avoided, from the standpoint of this
	project, if each publisher maintains its own site, although
	I don't expect that to be the eventual general case)

Local URC resolution service (check to see whether the target
	document is available on the local system)

Very simple URL(?) format to return answers to only 2 types of 
	queries:  1) given a URN, return all URLs; 2) given
	URC for TITLE, return all URLs (anything more demands
	that we decide upon a query language, which is a contentious
	matter that is not the point of this exercise).  

Some machinery for the browser to choose a URL from among 
	those returned

Then it has to be wrapped up and made to work so that I can write a link
in my Docbook document (Docbook has a Ulink element to hold
URLs) like this, for a URN:
   ... blarty foo <ulink url="the.urn.goes.here">Windows 3.1
	User's Guide</ulink> blarts more blarts

or this, for a URC title query:

   ... blarty foo <ulink url="the.urc.for.title.goes.here">Windows 3.1
	User's Guide</ulink> blarts more blarts

and when the user clicks on the hot spot, the intended document
is fetched and displayed from the local installation or from the
Internet, assuming the user is connected to it.  It may be
desirable to extend <ulink> with attributes additional to 
the present URL attribute.

The browser has to transmit the URN or URC to the local URC 
resolution service, then if need be to the publisher's URC
resolution service site, and upon receipt of the response,
to invoke the "some machinery" to pick a URL and fetch it.

The resolution service needs to parse the complete URC set, or
consult a preparsed table, to return the appropriate info.


FORMAT of the URC INFORMATION

As the documents in question are electronic books, it seems
appropriate to use either a TEI header or the USMARC format
to represent the bibliographic metadata, as both formats
are well worked out.  I chose TEI because I think it will be
easier for Davenporters to use and I didn't want to learn
USMARC rules.  Here's a sample set of information marked
up in strict accordance with the TEI P3 DTD.  It may be
desirable to define a subset of this DTD for Davenport
purposes; I am still exploring the possibilities offered
by the TEI header; see recent posts to TEI-L.  This
set has a bit more info than is strictly needed.


<!doctype teiheader system "tei2.dtd"[
<!ENTITY % TEI.mixed 'INCLUDE' >
<!ENTITY % TEI.names.dates 'INCLUDE' >
<!entity % isonum system "iso-num.gml">
	%isonum;
]>
<!-- need isonum for the ampersand in O'Reilly and Asso -->
<teiheader>
<!-- as for most elements, the attributes of teiheader are not 
	really needed for an elementary URC -->
<filedesc>
<titlestmt>
<title>X Window System User's Guide:  electronic edition</>
<!-- TEI recommends that you distinguish the titles of print works
	and electronic versions in this fashion, using one of two 
	set phrases, the other one being "a machine readable 
	transcription" -->
<author>Valerie Quercia</>
<author>Tim O'Reilly</>
</>

<editionstmt>
<edition>OSF/Motif 1.2 Edition</>
</editionstmt>

<publicationstmt>
<publisher>O'Reilly &amp; Associates, Inc.</>
<idno type=ISBN>12345678-9</>
<!-- ISBN of the electronic edition, not of the print book -->
<date>1 April 1994</>
</publicationstmt>

<seriesstmt>
<title>X Window System</>
<idno type=vol>3</>
</seriesstmt>

<sourcedesc>
<p>written as an etext
</>
</sourcedesc>

</filedesc>

<encodingdesc>
<classdecl>
<taxonomy id=LCSH>
<bibl>Library of Congress Subject Headings
</bibl>
</taxonomy>
</classdecl>
</encodingdesc>

<profiledesc>
<textclass>
<keywords scheme=LCSH>
<list>
<item>Computer software documentation</>
<item>Computer software configuration management</>
</list>
</keywords>
</textclass>
</profiledesc>
</teiheader>


THE TEIHEADER IN AN URC SET

Here's a sample document, with DTD, that wraps the above 
TEI header along with URNs and URLs into one large element I 
called URC.  The content model of URC is arranged simply for 
convenience in the present trial, and should be regarded pretty 
much as a placeholder or strawman.  I use <urc.etc> to
represent all the other flavors of URC sets that might
exist.  The prologue includes the DTD and the pieces from
the prologue of the sample TEI header shown above; the
included entity would be the <teiheader>...</teiheader>
part of the sample above.

<!doctype urc [
<!element urc - - (urc.tei.davenport*, urc.etc*)>
<!element urc.etc - - (#pcdata) -- placeholder -- >
<!element urc.tei.davenport - - (teiheader, 
	((URN+, URL*) | URL+)) -- at least one URN or
	at least one URL, per comments at Davenport 
	meeting, and thanks to Eve Maler -- >
<!element (urn|url) - - (#PCDATA)>
<!entity % isonum system "iso-num.gml">
	%isonum;
<!ENTITY % TEI.general 'INCLUDE' >
<!ENTITY % TEI.names.dates 'INCLUDE' >
<!entity % teidtd system "tei2.dtd">
	%teidtd;
<!entity teix system "tei.exmpl.v3m">
<!-- teix is the teiheader example shorn of its doctype decl -->
]>
<urc>
<urc.tei.davenport>
&teix;
<urn>Very.Fine.Example
</urn>
<url>http://com.com.com/very.fine
</url>
<url>http://edu.edu.edu/v.fine.example
</url>
</urc.tei.davenport>
</urc>


OPEN ISSUES

Practically everything is open, but here's a short list.

Is the list of pieces given above complete?  correctly divided
into components?  

What should be the syntax of the URL attribute values for
URN and URC/title?  should Docbook's Ulink be extended with
additional attributes?  

Can the "local resolution service" for URN>URL resolution
be so simple as an SGML entity catalogue in the style set up
by SGML Open?  can the local service for simple URC/title
queries be specified so that it could be implemented *as a
layer distinct from the URC set and document encoding* in
Hytime by those interested in doing so?

Is LCSH an appropriate choice for a keyword thesaurus
(beyond the scope of the project, really, but something
to be thinking about)?

If one wishes to establish URNs for sections and subsections
of a document, how should they be nested, if at all, in
the overall URC set?

Who would be interested in helping with some of the other
pieces?  There's no money in this project, at least at this
stage of development, and maybe not any glory, either. 



-- 
Terry Allen  (terry@ora.com)   O'Reilly & Associates, Inc.
Editor, Digital Media Group    101 Morris St.
			       Sebastopol, Calif., 95472
A Davenport Group sponsor.  For information on the Davenport 
  Group see ftp://ftp.ora.com/pub/davenport/README.html
	or  http://www.ora.com/davenport/README.html