Date: Thu, 28 Nov 91 08:57:45 GMT+0100 From: timbl (Tim Berners-Lee) Message-Id: <9111280757.AA16501@ nxoc01.cern.ch > To: www-talk Subject: Document identifiers [from clifford lynch via brewster kahle] The Coalition for Networked Information Architectures & Standards Working Group Workshop on ID and Reference Structures for Networked Information There is an increasingly urgent need to develop working standards for referencing networked information objects. This has a wide range of applications, including links from MARC records to source material, references from courseware to published material in electronic form, networked hypertext pointers, and digital document IDs of the sort used in the Wide Area Information Server (WAIS) system. Many projects underway today need these types of identifiers, and a number of efforts have developed ad-hoc solutions so that they can progress. Unfortunately, the proliferation of these ad-hoc solutions is a major barrier to interoperability. Responding to this need, the Coalition for Networked Information's Architectures and Standards Working Group is initiating an effort to develop such a working standard, or agreement. One outcome of this work may be a draft specification that is forwarded to standards-making bodies such as the National Information Standards Organization for consideration as the basis of an actual standard. In addition, the resulting specification may be submitted to the Internet Engineering Task Force for consideration as a draft Request for Comment (RFC). I propose the following process to reach agreement. I am distributing this announcement, which includes a number of assumptions towards such a specification; redistribution is encouraged. Discussion can be carried out electronically on the new LISTSERV mailing list that has been set up for the Architectures and Standards Working Group, which you can subscribe to by sending a mail message in the form SUB CNI-ARCH yourname to LISTSERV@UCCVMA.BITNET Barring the unlikely event that rapid and full agreement on the specification is reached through electronic discussion, CNI will sponsor a one-day invitational meeting in early November (date and place to be determined). If you have a strong interest in this topic and feel you should attend the meeting, contact me either by electronic mail (CALUR@UCCMVSA.BITNET or CALUR@UCCMVSA.UCOP.EDU) or by telephone (510) 987-0522 to have your name added to the invitation list. Aspects of the problem that need to be addressed include those below, which I have listed along with some assumptions (all subject to question) to provide a starting point for our discussions. I do not claim that this list is complete; look for areas overlooked as well as react to those mentioned. Many people have contributed ideas that appear in the list below, but I must make special note of the contributions of Brewster Kahle of Thinking Machines and his excellent document "Document Identifiers, or International Standard Book Numbers for the Electronic Age" (5/9/90). 1. The need for identifiers, as distinct from location information. This is best handled by a number (much like an ISSN or ISBN), but the system must accomodate multiple number-assigning agencies. Thus, the identifier is proposed as <numbering-authority>,<identifier> where numbering authorities are registered. 2. The pointers must be representable as an ASCII string to facilitate inclusion in a wide range of material, including documents and electronic mail. 3. Location information must support multiple Locations for the document, including the "location of record" and one or more redistribution centers, local caches, etc. The means of specifying a location should be sufficiently general to span at least the set of networks covered under the Internet Domain Naming system (DNS). 4. Objects may be retrieved by a variety of access mechanisms from servers, including FTP, LISTSERV, Z39.50, and perhaps FTAM and SQL-based database access, as well as requests for paper copies. The location information should be sufficiently general to include information about these different types of access techniques, and extensible to include new access methods that may develop in future. 5. Perhaps the location identifier should include some information about the format and size of the object; on the other hand, perhaps it should not. Discussion? 6. It should be possible to further qualify a reference to a "sublocation" within an object (which would have meaning only to the server that houses it). This is needed, for example, for hypertext-type links. Such a sublocation might be the 25th paragraph of a text, for a hypertext-type pointer. 7. Indirection should be supported. In other words, one should be able to format the location as the name of a server that can be passed the identifier and which would return location information. The protocol mechanism(s) for doing this need to be specified as well. 8. While full rights and permissions data would seem to be outside the scope of such a pointer, it might be useful to include at least some basic information. This might be an indication that the object is not copyrighted and can be freely distributed, that it is copyrighted but can be freely distributed, that it can be redistributed for noncommercial use, or that restrictions apply to redistribution. Also, it might make sense to include a pointer of some sort (an e-mail address? a host address?) for further information about rights. 9. Perhaps there might be some type of checksum that can be calculated on the retrieved object to ensure that the pointer and the object have not gotten out of synch?