Re: Draft: Universal Document Identifiers

Tim Berners-Lee (timbl)
Mon, 2 Mar 92 12:36:33 GMT+0100


Date: Mon, 2 Mar 92 12:36:33 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9203021136.AA14036@ nxoc01.cern.ch >
To: bcn@isi.edu (Clifford Neuman)
Subject: Re: Draft: Universal Document Identifiers
Cc: cni-arch@uccvma.bitnet, www-talk@nxoc01.cern.ch, wais-talk@think.com,

Cliff,

Thanks for your input, with explanations of addressing in Prospero.

Prospero should certainly go into the document. Indeed, it seems to  
fit in very well.   The small differences raise some interesting  
questions -- reactions off the top of my head follow, in the sequence  
of you messsage.

	Tim
  _______________________________________________

> Date: Thu, 27 Feb 92 10:52:44 PST
> From: bcn@isi.edu (Clifford Neuman)
> 

> I have glanced through your document on universal directory
> identifiers, and you seem to have left out Prospero.

Omission was from ignorance of the details you provide here and will  
certainly be corrected. Prospero is very relevant.

> In particular, a Prospero link consists of two
> parts, a host name, and a name of the object on that host.  The
> latter part is usually a path name, but in reality, it can be any 

> string, including simply a unique ID.  Thus, a Prospero link might 

> look like
>
> TGO.ISI.EDU /a/b/c  or   GUM.ISI.EDU 27

The UDI syntax //TGO.ISI.EDU/a/b/c or //GUM.ISI.EDU/27 matches that  
very well.  I suggest the prefix "prospero:" for prospero addresses.

> A Prospero link has a few other fields as well, but perhaps less
> important.  There is a type field for the hostname.  It indicates
> whether the hostname is an Internet name or address, or perhaps  
some
> other kind of name or address.  Only one type is presently  
supported
> (INTERNET-D) though, and that type includes Internet host names or
addresses, with or without an optional Internet UDP port.
>
>  examples: TGO.ISI.EDU, TGO.ISI.EDU(191), 128.9.224.123, or  
128.9.224.123(191)

The UDI scheme foresees these possibilities. These would map onto
//TGO.ISI.EDU/, //TGO.ISI.EDU:191/, //128.9.224.123/ and  
/128.9.224.123:191/ respectively. The whole UDI of the file above  
would be (if quoted out of the "prospero:" context),

	prospero://TGO.ISI.EDU:191/a/b/c


We, also, wondered about how to extend the system when other  
underlying protcols are used with the same higher-level protocol.  
Suppose for example later one adds dial-up prospero. Should one write

	prospero://dialup:+12025672654:200/a/b/c

or 	prospero-dialup:/+12025672654:200/a/b/c ?

My feeling is that the number of underlying network layers which have  
complete world-wide coverage will remain low. Furthermore, one can  
even imagine gateways there, so that those without X25 acces, say,  
can go throuh some transport level gateway from TCP/IP if the need  
arises. This suggests putting other low-level addresses into the  
"host/port" field, encoded in some fashion. One would hope that there  
will be less forms of transport service access point address than  
there will be application layer protocols.

> The name relative to the host is also typed.  Presently, the only  
type
> supported is ASCII, but the type field is there just in case.

The rule we have used is to put type information, if part of the  
link, into the path.  protocols differ upon whether they regard it as  
part of the link or it is returned when you try to retrieve the data.
In the latter case (which I prefer) it should not be in the UDI at  
all.

> Three other fields are a version number, a unique ID, and a type.  


The version number should I suggest be part of the path. Its  
significance will tend to vary between servers. The trouble is, as  
you say, noone has really put up a system dealing with multiple  
versions. We imagined having hidden links from a document to its  
previous, next and latest versions, and to a table of versions.

>The purpose of the unique ID is ... to provide a mechanism for  
detecting when an object has been
> deleted and replaced with an object of the same name.  In some  
cases,
> it might be important to note that the object being retrieved is  
not
> the same as the one to which the original link was made.

This is non-obvious.  My feeling is that a unique id is a useful  
thing, which I would regard as "header" information, ie information  
you can ask the server for.  Putting it into the link I'm not so sure  
about.  Suppose, for example, the retrieval goes through several  
stages of pointers, being referenced by serveral servers. Do you want  
to check that the final document, or the first link, was really the  
same as the one you made the original link to?

> Binding to an access method is accomplished by sending
> a message to the Prospero server at the address in the link, and
> requesting the access method for the named object.  The response
> includes a sequence of tokens, the first identifies the access  
method,
> and the remainder identify the information specific to the access
> method (beyond that which already is part of the link).  If you
> understand the access method, then you also know how to interpret  
the
> remaining tokens.

That "late binding" is just the sort of "name-server" function which  
I was talking about, and which for example x500 might also fit into.
So long as both the input and the output to the process are UDIs,  
it's very flexible.

> For example, a response indicating access by anonymous FTP might be
> 

>  ANONYMOUS-FTP /pub/pfs/guest/README BINARY

We'd write that now as file:/(samehost)/pub/pfs/guest/README.  
Currently, if the access protocol has to be specified, then the host  
does too. It could default ot the host of the context of the UDI even  
when protcol fields are different.

The "binary" flag is an interesting one and a perennial question.  My  
assumption was that if you know how to handle a file when you've got  
it, then you must know how to transfer it.  In practice with FTP both  
mean that you have to have a table of file suffixes.

> Similar responses are supported for other methods, and a response
> might include more than one access method, in which case the
> application choose the method that best suits its needs.

Sounds fine.

> Now, back to the type field.  One of the shortcomings of the  
approach
> as described so far is that it requires a Prospero server to run on
> the system storing the object to be referenced.  This shortcoming  
is
> addressed by the external link.  The type field in a Prospero link
> provides information on what can be done with the link.  The three
> common types are FILE, DIRECTORY, and EXTERNAL.  The links  
described
> above were of type FILE.  If a links type is directory, its  
contents
> can be listed by contacting the Prospero server (i.e. the links in  
the
> directory can be returned).  If a links type is EXTERNAL, it means
> that the object should be accessed without contacting a Prospero
> server to obtain the access method (usually because a Prospero  
server
> is not running on the target site).  Instead, the access  
information
> that would otherwise have been returned is encoded as part of the
> type.  Thus for example the type of an external link to the file
mentioned above would be.

  EXTERNAL(AFTP,BINARY)

Your "EXTERNAL" type is a pointer to a document in another naming  
scheme which neat, and expandable -- I like it.  The UDI syntax was  
basically invented to allow one to to that, so that all these systems  
can work together. Basically, type EXTERNAL(xxx) maps onto putting an  
xxx: prefix on the UDI. In your example, it maps to giving a file:  
reference.

You have, for prospero, the flag in the link as to whether the object  
is a directory or a file.  So does the Gopher.  This is useful for  
displaying different icons, etc. for the user.  A snag is that if we  
include anonymous FTP file systems, the NLIST command doesn't tell  
you that information, so it doesn't map.  You have to try to retrieve  
it and if that fails, cd to it.  If the flag is considered useful,  
then we could use the converntion (of ls-F) that a/c/b/ is a  
directory and a/b/c is a file. The trouble is, that you can't get  
that information from an FTP server without assuming unix to parse a  
long listing.

Do I _have_ to know in advance whether a Prospero item is a directory  
or a file?

> Note that for external links using the AFTP or FTP method, the name
> field of the link contains the path name to be passed to FTP.  For
> other access methods, the meaning of the field is defined by the
> particular access method to be used.

Yup - the UDI assumptions exactly.

> Anyway, I hope this adequate explains the form of Prospero
> identifiers, and I hope that you can fit it in to your proposed
> format. 

>
>	~ Cliff

Thanks for a very clear explanation.  It soudds as though Prospero  
will fit very well into the format.  I'll put it into the next draft  
of the document.

	- Tim