Date: Wed, 4 Mar 92 10:42:32 GMT+0100 From: timbl (Tim Berners-Lee) Message-Id: <9203040942.AA17545@ nxoc01.cern.ch > To: jcurran@nnsc.nsf.net Subject: Re: Draft: Universal Document Identifiers Cc: cni-arch@uccvma.bitnet@nnsc.nsf.net, wais-talk@think.com, > Date: Thu, 27 Feb 92 19:45:42 -0500 > From: jcurran@nnsc.nsf.net >Even if the exact scheme is not used, the requirement > discussion contained in the paper is quite valuable. > I have a few comments: > >] Terms >] >] The objects on the network which are to be named include >] objects which can be retrieved, and objects which can be searched. > Using this definition, one would infer that document identifiers > would allow reference to a distinct file, a particular mail > message, news article, etc. I would not anticipate that a document > identifier would be used to identify a newsgroup, interactive > service, archive directory , or a wais source. Are we trying to > define a universal id or a universal document id? Might it be > better to defer the definition of non-document resources and then > come back and make the document specific id's be a subset of a > future general resource identifier? You are right that the UDIs were inteneded to be able to refer to any of those things. (In the W3 world, they all look pretty similar anyway -- they are all represented as [hyper]text objects.) It is largely in order to be able to make references to any of those things that we need a UDI rather than a WAIS-DI and a W3-DI and a news-DI etc etc. A UDI allows references between systems, and expandability for the future. My answer would be that we are trying to define a universal document id, but where "document" has the very wide interpretation as any data which can be retrieved, viewed or searched: anything to which you might want to make a reference. For example, a person is not a document (although to have a document on the net representing each person might be useful... their signature/disclaimer with links to their published works, etc etc.) If we can't cope with the objects which are on the net now, how can we hope to cope with the wierd things to come .. video clips from the news last night etc... ] Relevance ] ] The life of a name is limited by any information contained within it which ] may become prematurely invalid. It is therefore necessary to limit the ] contents of a name to the information required for the operations above. ] Other extraneous information about the document (its size, data format, ] authorization details, etc) may in general change with time and should ] not be part of the name. > The proposed document identifiers have many characteristics which > may change with time: storage location, access protocol, format, > etc. If we focus instead on the "information content" of a \ > document, then it might be possible to form identifiers that are > more robust. Many people consider: > > file://info.cern.ch./pub/www/doc/udi1.ps and > file://info.cern.ch./pub/www/doc/udi1.txt > > to be the same document; just in different formats. Precisely. We look forward to the day when a name like x500:/CH/CERN/CN/TBL/TechNote-15 will be put through a name server which will return a set of addresses. In the mean time, we don't have that ubiquitous name server (directory) facility. So we have to make do with physical addresses. And different versions of the same document look like different documents. Its a shame. The plan is that UDIs can migrate from physical addresses to registered names. > It would be nice to be able to recognize this > and allow the user (and user interface) to determine which > instance should be used for retreival. Yes. Absolutely. (The neatest way is for the client to send a set of preferences over with the request, and for the server to decide which to format to send. This is a suggestion for an evolved wais and/or http protoccol.) Another way if for the client to ask a name server for addresses, and retrieve the headers of each one to find out which representation he'd prefer -- But I'd prefer all the represenattions of the document to have the same name right down to the retrieval protocol level. > This recognition may only > be perform if the document id's (now being used document content > ids) contain only location and format independant data. It is easy > to imagine that uniqueness could be assured by combining > an organization, author, and title: > > > cern.ch:www-staff:udi1 > > ietf:osids:archdirectory-00 There are two functions: One, to find out whethre two documents are the same. Two, to derive a (set of) addresses for retrieval of the document. To be able to do the first, any unique id (like OSF/DCE UUIDs or RFCxxxx message ids) will work. To be able to do the second, a directory service is needed. > Note that the actual location of the information might be far > removed from the point of creation, and the format might be > changed: > >cern.ch:www-staff:udi1;file://ftp.uu.net/doc/univeral-docids.PS.Z >cern.ch:www-staff:udi1;news:<1992Feb21.121919.1@quake.think.com> >cern.ch:www-staff:udi1;wais://nnsc.nsf.net/info-retrieval-notes?udi1 I see the usefulness of quoting both the unique identifier and the physical address. I hope that in the future, though, one will only need the first part "cern.ch:www-staff:udi1". That, fed into the directory service, will produce a list of addresses. You can, of course, still quote both: "You need document x500:/cern.ch/www-staff/udi1 which I found on file://ftp.uu.net/doc/univeral-docids.PS.Z". I would also suggest that if a document has a unique registered name then it should certainly contain that name, so that if you find it some otherway, you can refer to it (make links to it) by its official name. > That's all > /John Good points -- thanks for the input...I think more needs to go in about registered unique names in the document. Tim BL