Re: Draft: Universal Document Identifiers

Tim Berners-Lee (timbl)
Wed, 4 Mar 92 10:42:32 GMT+0100


Date: Wed, 4 Mar 92 10:42:32 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9203040942.AA17545@ nxoc01.cern.ch >
To: jcurran@nnsc.nsf.net
Subject: Re: Draft: Universal Document Identifiers
Cc: cni-arch@uccvma.bitnet@nnsc.nsf.net, wais-talk@think.com,

> Date: Thu, 27 Feb 92 19:45:42 -0500
> From: jcurran@nnsc.nsf.net

>Even if the exact scheme is not used, the requirement
> discussion contained in the paper is quite valuable.
> I have a few comments:
> 

>] Terms
>]
>] The objects on the network which are to be named include
>] objects which can be retrieved, and objects which can be searched.

> Using this definition, one would infer that document identifiers
> would allow reference to a distinct file, a particular mail
> message, news article, etc. I would not anticipate that a document
> identifier would be used to identify a newsgroup, interactive
> service, archive directory , or a wais source.  Are  we trying to
> define a universal id or a universal document id?  Might it be
> better to defer the definition of non-document resources and then
> come back and make the document specific id's be a subset of a
> future general resource identifier?

You are right that the UDIs were inteneded to be able to refer to any  
of those things. (In the W3 world, they all look pretty similar  
anyway -- they are all represented as [hyper]text objects.)  It is  
largely in order to be able to make references to any of those things  
that we need a UDI rather than a WAIS-DI and a W3-DI and a news-DI  
etc etc.  A UDI allows references between systems, and expandability  
for the future.  My answer would be that we are trying to define a  
universal document id, but where "document" has the very wide  
interpretation as any data which can be retrieved, viewed or  
searched: anything to which you might want to make a reference.
For example, a person is not a document (although to have a document  
on the net representing each person might be useful... their  
signature/disclaimer with links to their published works, etc etc.)  
If we can't cope with the objects which are on the net now, how can  
we hope to cope with the wierd things to come .. video clips from the  
news last night etc...


] Relevance
] 

] The life of a name is limited by any information contained within  
it which 

] may become prematurely invalid. It is therefore necessary to limit  
the 

] contents of a name to the information required for the operations  
above. 

] Other extraneous information about the document (its size, data  
format, 

] authorization details, etc) may in general change with time and  
should 

] not be part of the name.

> The proposed document identifiers have many characteristics which 

> may change with time: storage location, access protocol, format, 

> etc. If we focus instead on the "information content" of a \
> document, then it might be possible to form identifiers that are
>  more robust.  Many people consider:
>
> file://info.cern.ch./pub/www/doc/udi1.ps      and 

> file://info.cern.ch./pub/www/doc/udi1.txt
>
> to be the same document; just in different formats.

Precisely. We look forward to the day when a name like

	x500:/CH/CERN/CN/TBL/TechNote-15

will be put through a name server which will return a set of  
addresses. In the mean time, we don't have that ubiquitous name  
server (directory) facility. So we have to make do with physical  
addresses. And different versions of the same document look like  
different documents. Its a shame. The plan is that UDIs can migrate  
from physical addresses to registered names.



> It would be nice to be able to recognize this
> and allow  the user (and user interface) to determine which
> instance should be used for retreival.

Yes. Absolutely.  (The neatest way is for the client to send a set of  
preferences over with the request, and for the server to decide which  
to format to send. This is a suggestion for an evolved wais and/or  
http protoccol.) Another way if for the client to ask a name server  
for addresses, and retrieve the headers of each one to find out which  
representation he'd prefer -- But I'd prefer all the represenattions  
of the document to have the same name right down to the retrieval  
protocol level.

> This recognition may only
> be perform if the document id's (now being used document content
> ids) contain only location and format independant data.  It is easy
> to imagine that uniqueness could be assured by combining
> an organization, author, and title:
>
>
> cern.ch:www-staff:udi1 

>
> ietf:osids:archdirectory-00 


There are two functions: One, to find out whethre two documents are  
the same. Two, to derive a (set of) addresses for retrieval of the  
document. To be able to do the first, any unique id (like OSF/DCE  
UUIDs or RFCxxxx message ids) will work. To be able to do the second,  
a directory service is needed.

> Note that the actual location of the information might be far
> removed from the point of creation, and the format might be
> changed:
>
>cern.ch:www-staff:udi1;file://ftp.uu.net/doc/univeral-docids.PS.Z
>cern.ch:www-staff:udi1;news:<1992Feb21.121919.1@quake.think.com>
>cern.ch:www-staff:udi1;wais://nnsc.nsf.net/info-retrieval-notes?udi1

I see the usefulness of quoting both the unique identifier and the  
physical address. I hope that in the future, though, one will only  
need the first part "cern.ch:www-staff:udi1". That, fed into the  
directory service, will produce a list of addresses.

You can, of course, still quote both: "You need document  
x500:/cern.ch/www-staff/udi1 which I found on  
file://ftp.uu.net/doc/univeral-docids.PS.Z".

I would also suggest that if a document has a unique registered name  
then it should certainly contain that name, so that if you find it  
some otherway, you can refer to it (make links to it) by its official  
name.

> That's all
> /John

Good points -- thanks for the input...I think more needs to go in  
about registered unique names in the document.

	Tim BL