Date: Mon, 2 Mar 92 12:36:33 GMT+0100 From: timbl (Tim Berners-Lee) Message-Id: <9203021136.AA14036@ nxoc01.cern.ch > To: bcn@isi.edu (Clifford Neuman) Subject: Re: Draft: Universal Document Identifiers Cc: cni-arch@uccvma.bitnet, www-talk@nxoc01.cern.ch, wais-talk@think.com, Cliff, Thanks for your input, with explanations of addressing in Prospero. Prospero should certainly go into the document. Indeed, it seems to fit in very well. The small differences raise some interesting questions -- reactions off the top of my head follow, in the sequence of you messsage. Tim _______________________________________________ > Date: Thu, 27 Feb 92 10:52:44 PST > From: bcn@isi.edu (Clifford Neuman) > > I have glanced through your document on universal directory > identifiers, and you seem to have left out Prospero. Omission was from ignorance of the details you provide here and will certainly be corrected. Prospero is very relevant. > In particular, a Prospero link consists of two > parts, a host name, and a name of the object on that host. The > latter part is usually a path name, but in reality, it can be any > string, including simply a unique ID. Thus, a Prospero link might > look like > > TGO.ISI.EDU /a/b/c or GUM.ISI.EDU 27 The UDI syntax //TGO.ISI.EDU/a/b/c or //GUM.ISI.EDU/27 matches that very well. I suggest the prefix "prospero:" for prospero addresses. > A Prospero link has a few other fields as well, but perhaps less > important. There is a type field for the hostname. It indicates > whether the hostname is an Internet name or address, or perhaps some > other kind of name or address. Only one type is presently supported > (INTERNET-D) though, and that type includes Internet host names or addresses, with or without an optional Internet UDP port. > > examples: TGO.ISI.EDU, TGO.ISI.EDU(191), 128.9.224.123, or 128.9.224.123(191) The UDI scheme foresees these possibilities. These would map onto //TGO.ISI.EDU/, //TGO.ISI.EDU:191/, //128.9.224.123/ and /128.9.224.123:191/ respectively. The whole UDI of the file above would be (if quoted out of the "prospero:" context), prospero://TGO.ISI.EDU:191/a/b/c We, also, wondered about how to extend the system when other underlying protcols are used with the same higher-level protocol. Suppose for example later one adds dial-up prospero. Should one write prospero://dialup:+12025672654:200/a/b/c or prospero-dialup:/+12025672654:200/a/b/c ? My feeling is that the number of underlying network layers which have complete world-wide coverage will remain low. Furthermore, one can even imagine gateways there, so that those without X25 acces, say, can go throuh some transport level gateway from TCP/IP if the need arises. This suggests putting other low-level addresses into the "host/port" field, encoded in some fashion. One would hope that there will be less forms of transport service access point address than there will be application layer protocols. > The name relative to the host is also typed. Presently, the only type > supported is ASCII, but the type field is there just in case. The rule we have used is to put type information, if part of the link, into the path. protocols differ upon whether they regard it as part of the link or it is returned when you try to retrieve the data. In the latter case (which I prefer) it should not be in the UDI at all. > Three other fields are a version number, a unique ID, and a type. The version number should I suggest be part of the path. Its significance will tend to vary between servers. The trouble is, as you say, noone has really put up a system dealing with multiple versions. We imagined having hidden links from a document to its previous, next and latest versions, and to a table of versions. >The purpose of the unique ID is ... to provide a mechanism for detecting when an object has been > deleted and replaced with an object of the same name. In some cases, > it might be important to note that the object being retrieved is not > the same as the one to which the original link was made. This is non-obvious. My feeling is that a unique id is a useful thing, which I would regard as "header" information, ie information you can ask the server for. Putting it into the link I'm not so sure about. Suppose, for example, the retrieval goes through several stages of pointers, being referenced by serveral servers. Do you want to check that the final document, or the first link, was really the same as the one you made the original link to? > Binding to an access method is accomplished by sending > a message to the Prospero server at the address in the link, and > requesting the access method for the named object. The response > includes a sequence of tokens, the first identifies the access method, > and the remainder identify the information specific to the access > method (beyond that which already is part of the link). If you > understand the access method, then you also know how to interpret the > remaining tokens. That "late binding" is just the sort of "name-server" function which I was talking about, and which for example x500 might also fit into. So long as both the input and the output to the process are UDIs, it's very flexible. > For example, a response indicating access by anonymous FTP might be > > ANONYMOUS-FTP /pub/pfs/guest/README BINARY We'd write that now as file:/(samehost)/pub/pfs/guest/README. Currently, if the access protocol has to be specified, then the host does too. It could default ot the host of the context of the UDI even when protcol fields are different. The "binary" flag is an interesting one and a perennial question. My assumption was that if you know how to handle a file when you've got it, then you must know how to transfer it. In practice with FTP both mean that you have to have a table of file suffixes. > Similar responses are supported for other methods, and a response > might include more than one access method, in which case the > application choose the method that best suits its needs. Sounds fine. > Now, back to the type field. One of the shortcomings of the approach > as described so far is that it requires a Prospero server to run on > the system storing the object to be referenced. This shortcoming is > addressed by the external link. The type field in a Prospero link > provides information on what can be done with the link. The three > common types are FILE, DIRECTORY, and EXTERNAL. The links described > above were of type FILE. If a links type is directory, its contents > can be listed by contacting the Prospero server (i.e. the links in the > directory can be returned). If a links type is EXTERNAL, it means > that the object should be accessed without contacting a Prospero > server to obtain the access method (usually because a Prospero server > is not running on the target site). Instead, the access information > that would otherwise have been returned is encoded as part of the > type. Thus for example the type of an external link to the file mentioned above would be. EXTERNAL(AFTP,BINARY) Your "EXTERNAL" type is a pointer to a document in another naming scheme which neat, and expandable -- I like it. The UDI syntax was basically invented to allow one to to that, so that all these systems can work together. Basically, type EXTERNAL(xxx) maps onto putting an xxx: prefix on the UDI. In your example, it maps to giving a file: reference. You have, for prospero, the flag in the link as to whether the object is a directory or a file. So does the Gopher. This is useful for displaying different icons, etc. for the user. A snag is that if we include anonymous FTP file systems, the NLIST command doesn't tell you that information, so it doesn't map. You have to try to retrieve it and if that fails, cd to it. If the flag is considered useful, then we could use the converntion (of ls-F) that a/c/b/ is a directory and a/b/c is a file. The trouble is, that you can't get that information from an FTP server without assuming unix to parse a long listing. Do I _have_ to know in advance whether a Prospero item is a directory or a file? > Note that for external links using the AFTP or FTP method, the name > field of the link contains the path name to be passed to FTP. For > other access methods, the meaning of the field is defined by the > particular access method to be used. Yup - the UDI assumptions exactly. > Anyway, I hope this adequate explains the form of Prospero > identifiers, and I hope that you can fit it in to your proposed > format. > > ~ Cliff Thanks for a very clear explanation. It soudds as though Prospero will fit very well into the format. I'll put it into the next draft of the document. - Tim