Document identifiers

Tim Berners-Lee (timbl)
Thu, 28 Nov 91 08:57:45 GMT+0100

Date: Thu, 28 Nov 91 08:57:45 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9111280757.AA16501@ >
To: www-talk
Subject: Document identifiers

[from clifford lynch via brewster kahle]

The Coalition for Networked Information
Architectures & Standards Working Group

Workshop on ID and Reference Structures
for Networked Information

There is an increasingly urgent need to develop working
standards for referencing networked information objects.
This has a wide range of applications, including links from
MARC records to source material, references from courseware
to published material in electronic form, networked
hypertext pointers, and digital document IDs of the sort
used in the Wide Area Information Server (WAIS) system.
Many projects underway today need these types of
identifiers, and a number of efforts have developed ad-hoc
solutions so that they can progress. Unfortunately, the
proliferation of these ad-hoc solutions is a major barrier
to interoperability.

Responding to this need, the Coalition for Networked
Information's Architectures and Standards Working Group is
initiating an effort to develop such a working standard, or
agreement. One outcome of this work may be a draft
specification that is forwarded to standards-making bodies
such as the National Information Standards Organization for
consideration as the basis of an actual standard. In
addition, the resulting specification may be submitted to
the Internet Engineering Task Force for consideration as a
draft Request for Comment (RFC).

I propose the following process to reach agreement. I am
distributing this announcement, which includes a number of
assumptions towards such a specification; redistribution is
encouraged. Discussion can be carried out electronically on
the new LISTSERV mailing list that has been set up for the
Architectures and Standards Working Group, which you can
subscribe to by sending a mail message in the form

SUB CNI-ARCH yourname



Barring the unlikely event that rapid and full agreement on
the specification is reached through electronic discussion,
CNI will sponsor a one-day invitational meeting in early
November (date and place to be determined). If you have a
strong interest in this topic and feel you should attend the
meeting, contact me either by electronic mail
telephone (510) 987-0522 to have your name added to the
invitation list.

Aspects of the problem that need to be addressed include
those below, which I have listed along with some assumptions
(all subject to question) to provide a starting point for
our discussions. I do not claim that this list is complete;
look for areas overlooked as well as react to those
mentioned. Many people have contributed ideas that appear in
the list below, but I must make special note of the
contributions of Brewster Kahle of Thinking Machines and his
excellent document "Document Identifiers, or International
Standard Book Numbers for the Electronic Age" (5/9/90).

1. The need for identifiers, as distinct from location
information. This is best handled by a number (much like an
ISSN or ISBN), but the system must accomodate multiple
number-assigning agencies. Thus, the identifier is proposed
as <numbering-authority>,<identifier> where numbering
authorities are registered.

2. The pointers must be representable as an ASCII string to
facilitate inclusion in a wide range of material, including
documents and electronic mail.

3. Location information must support multiple Locations for
the document, including the "location of record" and one or
more redistribution centers, local caches, etc. The means of
specifying a location should be sufficiently general to span
at least the set of networks covered under the Internet
Domain Naming system (DNS).

4. Objects may be retrieved by a variety of access
mechanisms from servers, including FTP, LISTSERV, Z39.50,
and perhaps FTAM and SQL-based database access, as well as
requests for paper copies. The location information should
be sufficiently general to include information about these
different types of access techniques, and extensible to
include new access methods that may develop in future.

5. Perhaps the location identifier should include some
information about the format and size of the object; on the
other hand, perhaps it should not. Discussion?

6. It should be possible to further qualify a reference to a
"sublocation" within an object (which would have meaning
only to the server that houses it). This is needed, for
example, for hypertext-type links.  Such a sublocation might
be the 25th paragraph of a text, for a hypertext-type

7. Indirection should be supported. In other words, one
should be able to format the location as the name of a
server that can be passed the identifier and which would
return location information. The protocol mechanism(s) for
doing this need to be specified as well.

8. While full rights and permissions data would seem to be
outside the scope of such a pointer, it might be useful to
include at least some basic information. This might be an
indication that the object is not copyrighted and can be
freely distributed, that it is copyrighted but can be freely
distributed, that it can be redistributed for noncommercial
use, or that restrictions apply to redistribution. Also, it
might make sense to include a pointer of some sort (an
e-mail address? a host address?) for further information
about rights.

9. Perhaps there might be some type of checksum that can be
calculated on the retrieved object to ensure that the
pointer and the object have not gotten out of synch?