From: "Ronald E. Daniel" <firstname.lastname@example.org> Date: Fri, 9 Jun 1995 06:58:54 -0600 Message-Id: <199506091258.GAA20129@idaknow.acl.lanl.gov> To: email@example.com Subject: URC spec 4/6 4 Default Attribute Set The use of inherited attribute set grammars provides a very general capability, and there are applications that can use that generality to improve their functionality and scalability. However, the primary purpose of the URC service is URN to URL resolution, and the speed of that resolution is a major concern for the URC service. Response times for interactive browsing do not allow multiple network accesses in order to fetch DTD fragments and build a grammar for parsing a URC. Therefore, we provide a default attribute set whose semantics are to be broadly understood. If a URC comes in that has been described using this attribute set, then it can be parsed in either a formal (according to ISO 8879) or a heuristic (anything else) fashion. If a URC comes in with no attribute set specified, then the default attribute set is assumed. Furthermore, if a URC comes in with a different attribute set, we mandate that any elements it has with the same names as elements in the default attribute set must have similar semantics to the elements of the default set. This will allow heuristic parsing of new attribute sets. To amplify this last point, consider the TITLE element. Anyone creating a new attribute set that contains TITLE must use it in a fashion similar to that in the default attribute set. If someone wants to use TITLE for a very different purpose (describing royalty perhaps), they must add special information (scheme, type, or other attributes) so that it is possible to tell that TITLE is being used in a novel fashion and simple parsers are not led astray. For the purposes of this draft of the specification, and the prototypes develop0ed from it, the AID for the default attribute set shall be: <urn:x-dns-2:uri.acl.lanl.gov:default-dtd> It is anticipated that as this specification moves to the standards track, IANA will provide the URN for the default attribute set. The default attribute set is strongly based on the work of the Dublin metadata workshop . A brief overview is given below. The attribute set is based on a few principles. First, we provide a set of elements with widely understood semantics (such as Title). Second, we provide a mechanism for specifying more precise interpretations of those elements (such as Subject (scheme=LCSH) Computer Science). Third, we allow a variety of transfer syntaxes for the attributes. Plain text, HTML, binary encodings, etc. can all be used, as described further in section 5. All the elements in the default attribute set are optional and repeatable. No particular order is required. Identifier: String or number used to uniquely identify this object. Ron Daniel [Page 8] INTERNET-DRAFT An SGML-based URC Service June 7, 1995 Typically this will be the URN for the resource. Other identifiers may also be included. Instance: Instance is a construct for grouping information on particular instances of a resource, such as location, format, price, etc. URL: The URL element contains a URL that can be used to retrieve an instance of the resource. Author: The person(s) and/or organization(s) primarily responsible for the intellectual content of the work. Title: The name of the object. Subject: The field of knowledge to which the work belongs. Publisher: The agent or agency responsible for making the object available. Date: The date of publication. Other Agent: Other person(s) and/or organization(s), such as editors, transcribers, sponsors, etc. who have made significant contributions to the work. Author and Publisher are special cases of OtherAgent. Object type: The abstract category of the object, such as novel, poem, dictionary. Form: The particular manifestation or data representation of the object, such as PostScript file or Windows executable. For URCs, form will typically be specified as an Internet Media Type - formerly known as the MIME Content-type. Relation: Relationship to other objects. This element should identify the role of the relationship, as well as the related objects. Source: Objects, either print or electronic, from which this resource was derived. This is a special case of the Relation element that is believed to be widely useful to the humanities. Language: Natural language of the intellectual content. Coverage: The spatial locations and temporal durations characteristic of the object. Ron Daniel [Page 9] INTERNET-DRAFT An SGML-based URC Service June 7, 1995 5 Multiple Syntaxes Several of the URC requirements are difficult, if not impossible, to satisfy at the same time. For example, it is a requirement to have a human-readable, printed representation of a URC. It is also a requirement that URCs have a consistent encoding that is suitable for digital signature computation. Unfortunately, end-of-line convention differences between platforms make it difficult to meet both requirements simultaneously. A related problem is that there will be different uses for URC information, and different encodings will be appropriate to meet those different needs. For example, it might be useful to obtain citations in some format (Bib TeX perhaps) for inclusion in another system. Because of these considerations, we make another fundamental assumption: There is no universally applicable syntax for a URC. Because of this assumption, this specification calls for the use of format negotiation mechanisms, specifically the Accept: header in HTTP [?], to indicate preferred syntaxes. Depending on the user's intent for the URC, they can ask for an HTML encoding, a plain text encoding, an encrypted binary encoding, a synthesized audio rendition, etc. A set of examples are provided that show the requests and responses for different renditions of the same underlying information. All of these examples assume we are resolving the URN: urn:dns:pchs.k-12.okc.ok.us:student-papers-1995/geo3 5.1 Example 1: text/html When the URC service is first deployed, there will be a large base of existing web browsers that will not have the ability to parse URCs. One means of remaining compatible with these browsers is to request the URC in HTML. The user can look at information on the different locations for a resource, pick a site, and click on a link to retrieve a resource. The URN resolution request sent to the URC server's HTTP daemon might be: Ron Daniel [Page 10] INTERNET-DRAFT An SGML-based URC Service June 7, 1995 GET student-papers-1995/geo3 HTTP/1.0 Accept: text/html, */*; q=0.2 which says to send text/html if possible, anything else if not. The reply from the server might be: HTTP/1.0 200 OK Date: Tuesday, 08-Oct-96 21:09:16 GMT Server: Apache/2.7 MIME-version: 1.0 Content-type: text/html Last-modified: Friday, 26-Apr-96 21:57:12 GMT Content-length: 129 <html> <head> <title>URC for urn:x-dns-2:pchs.k-12.okc.ok.us:student-papers-1995/geo3 </title> </head> <body> <h1>urn:x-dns-2:pchs.k-12.okc.ok.us:student-papers-1995/geo3 </h1> Author: Smith, Fred<br> Title: Wanker! : A vicious, seditious, and tendentious his- tory of George III<br> Subject: American Revolution Subject: (In)famous crackpots of history Location: <A HREF= "http://www.pchs.k-12.okc.ok.us/student-papers/1995/smith/geo3.html"> http://www.pchs.k-12.okc.ok.us/student-papers/1995/smith/geo3.html</a><br> Form:text/html<br> </body> </html> 5.2 Example 2: text/urc0 The text/html example above provides one means for backward compatibility with legacy clients. However, the "click twice" model is going to get old really quickly. A CCI helper application could be constructed that would parse URCs in some trivial format, such as text/urc0 , and automatically pick a URL. GET student-papers-1995/geo3 HTTP/1.0 Accept: text/urc0, */*; q=0.2 Ron Daniel [Page 11] INTERNET-DRAFT An SGML-based URC Service June 7, 1995 HTTP/1.0 200 OK Date: Tuesday, 08-Oct-96 21:09:16 GMT Server: Apache/2.7 MIME-version: 1.0 Content-type: text/urc0 Last-modified: Friday, 26-Apr-96 21:57:12 GMT Content-length: 62 ===== http://www.pchs.k-12.okc.ok.us/student-papers/1995/smith/geo3.html 5.3 Example 3: text/sgml This example assumes the URC is prepared according to the default attribute set. Note that the document declaration is included in the response, and that a URN is used in the SYSTEM identifier. This is how the attribute set is indicated for the SGML syntax. This is somewhat contrary to the intent of SGML, which wants to use SYSTEM to locate local information, and PUBLIC for well-known, publicly accessible information. I have gone against that convention, since SGML PUBLIC identifiers have a restricted character set, perform some processing on the characters, and encourage a syntax (known as a formal public identifier) that has structure likely to be incompatible with URNs. By using a SYSTEM identifier we get a more liberal character set, and the string is handed off to an ``entity manager'' for processing so as to fetch the appropriate file. Using a URN here will require an enhanced entity manager, but such things are already part of some commercial SGML products. The URN resolution request sent to the URC server's HTTP daemon might be: GET student-papers-1995/geo3 HTTP/1.0 Accept: text/sgml, */*; q=0.2 The reply from the server might be: HTTP/1.0 200 OK Date: Tuesday, 08-Oct-96 21:09:16 GMT Server: Apache/2.7 MIME-version: 1.0 Content-type: text/sgml Last-modified: Friday, 26-Apr-96 21:57:12 GMT Content-length: 129 <!DOCTYPE URC SYSTEM "urn:x-dns-2:uri.acl.lanl.gov:default-1-dtd"> Ron Daniel [Page 12] INTERNET-DRAFT An SGML-based URC Service June 7, 1995 <urc> <identifier scheme="URN"> urn:dns:pchs.k-12.okc.ok.us:student-papers-1995/geo3 </identifier> <author> Smith, Fred </author> <title> Wanker! : A vicious, seditious, and tendentious history of George III </title> <Subject>American Revolution</subject> <Subject>(In)famous crackpots of history</subject> <instance> <URL> http://www.pchs.k-12.okc.ok.us/student-papers/1995/smith/geo3.html </URL> <form scheme="IMT">text/html</form> </instance> </URC> The examples above have suggested particular syntaxes. Determining the set of ``well-known'' syntaxes that all URC servers must be able to emit is one direction this standard could be extended. 6 Query Languages Diversity is the hallmark of the URC service. We want to encourage the formation of a variety of value-added services, therefore we need a standard means of dealing with unique capabilities. For example, different search services will have different information and support different forms of queries. This section defines a trivial query language that all URC servers and clients must support in order to claim conformance with the standard. This section also describes the scheme by which non-standard queries can be identified. 6.1 Trivial Query Language All queries in the trivial query language are HTTP GET requests. The simplest query is to ask for the URC of a resource by providing its URN. Note that the HTTP spec [?] says that the protocol and hostname parts of a URL are assumed to be known. This is not a valid assumption for URNs. URN. The server will return the full URC for the resource, in whatever format has been negotiated. Ron Daniel [Page 13] INTERNET-DRAFT An SGML-based URC Service June 7, 1995 A few other queries are defined to obtain information on the resolver, resources, etc. These are also HTTP GET requests. Where the request is to get server information, the FQDN shall be that of the server. They all begin with the reserved string "urn+". The requests and their meaning are: o urn+m Resolver meta-information (e.g. <urn:x-dns-2:FQDN:urn+m>) Returns a URC indicating URLs where resolver metainformation, such as the administrative contact, sponsoring organization, publication policy, etc. can be retrieved. o urn+a list of All RequestIDs (e.g. <urn:x-dns-2:FQDN:urn+a>) Returns a URC indicating URLs where a list of all URNs provided by the publisher can be obtained. The request must be understood by all resolvers, but the response may have zero URLs in it. In such a case the URC should contain a message saying the equivalent of ``sorry''. o urn+c Child naming authorities (e.g. <urn:x-dns-2:FQDN:urn+c>) Returns a URC indicating URLs where a list of any child naming authorities licensed by the naming authority associated with the FQDN can be found. o urn+p Parent naming authorities (e.g. <urn:x-dns-2:FQDN:urn+p>) Returns a URC indicating URLs where a list of any parent naming authorities licensing the naming authority associated with the FQDN can be found. o urn+aids (e.g. <urn:x-dns-2:FQDN:urn+aids>) Returns a URC indicating the URLs that can be used to fetch a list of all the attribute sets used to describe information on this server. 6.2 Query Language Identification This relatively minimal set of queries does not allow the construction of complex queries, such as "gimme the URNs and titles of all resources with 'Smith' as an author and 'food' as the subject". Different organizations will wish to add value to their URC collections in different fashions, so it is highly unlikely that one query language will meet all needs. Therefore, we provide an additional well-known query: Ron Daniel [Page 14] INTERNET-DRAFT An SGML-based URC Service June 7, 1995 o urn+qls (e.g. <urn:x-dns-2:FQDN:urn+qls>) Returns a URC that gives URNs and URLs for all the query languages known by this server, their conditions of use, etc. 6.3 Random Notes on Querying A notion that has been independently suggested by several parties is the notion of using a partially completed URC as a template to find full URCs that bear some relation to the template. Dealing with the inheritance of attribute sets, specification of relation, and server load issues is all future work for this specification, as is additional work on the unique identification of query languages.