Openning the WAIS document-id syntax

Tim Berners-Lee (timbl)
Thu, 26 Mar 92 15:25:12 GMT+0100


Date: Thu, 26 Mar 92 15:25:12 GMT+0100
From: timbl (Tim Berners-Lee)
Message-Id: <9203261425.AA23337@ nxoc01.cern.ch >
To: Jonny Goldman <jonathan@think.com>
Subject: Openning the WAIS document-id syntax
Cc: www-talk@nxoc01.cern.ch, wais-talk@think.com

> Date: Tue, 24 Mar 92 09:46:21 PST
> From: Jonny Goldman <jonathan@think.com>

Jonny,

This is relevant to the WAIS-FTP work Jim is doing.

Unfortunately none of the WAIS crowd could get to discussions at the IETF -- though  
John Curran represented the WAIS side. Those discussions were very interesting. 


The data model of WAIS (documents in databases) could be deconstrained to allow  
documents themselves to be or contain lists of documents, and for lists of  
documents to point to things other than documents in the same database.

This is the way the second part can work.  Normally, a search returns a list of  
doc-ids, each one (basically) like

	/usr/local/lib/wais/mydatabase/fred/myfile.txt

which is in fact a filename. There's a load of other stuff in there which we can  
ignore for now.  What a WAIS search needs to be able to do, when you are pointing
to files, is to return a pointer to a file in FTP say. We do that in two steps.
First, we recognise that that id is local to the conext of a wais server on host  
myhost and port myport. When the server returns that string, the client
uses knowledge of the context in which it was quoted to exapnd that to

	wais://myhost.dom.net:myport/usr/local/lib/wais/mydatabase/fred/myfile.txt

This is a refernece you can quote to anyone as it makes sense anywhere. No context.
I called it a UDI but we'll have to change the name. Document Access Token maybe.
It's like Brewster's proposal but extendable to other protocols.  [Yes, WAIS is a  
good protocol but there are others. Including name servers and directories which  
will be needed for long-lived but movable documents.]

Now suppose one day a server returns a doc-id INCLUDING the protocol, host, etc.  
For example, your WAIS FTP engine (like the ARCHIE WAIS) returns what are basically
pointers to files. Just now, because of the constraints of the model, it has to  
return a part of a file within the database. Suppose we change that, so that
in your case it just returns a doc-id which specifies anonymous ftp access, like:

	file://otherhost.com/pub/doc/mydoc.txt

The client has a general retrieval engine which can accept doc-ids in many domains  
-- not just WAIS. That allows it to go out over a different protocol to retrieve  
the object.

This is the way WWW and Gopher work.  They are open systems -- you can link into
any other system within reason.  That's why the fuss about universal document  
identifiers.  Maybe the WAIS people would to incorporate them -- that is, just
make sure that the normal WAIS server return things which are -- like the one
above -- special cases of the more general syntax.

I haven't had much comment from the WAIS side about the UDIs, but I'd like to have  
some. (file://info.cern.ch/pub/www/doc/udi1.ps was background for the IETF  
discussions.) We plan a small working group hacking out the details before an RFC  
is submitted.


> I like the idea of generalized interfaces, customized servers.

You bet!


- Tim BL