Date: Thu, 26 Mar 92 09:47:34 PST Message-Id: <9203261747.AA00262@philo.quake.think.com.> From: Jonny Goldman <jonathan@think.com> To: timbl@nxoc01.cern.ch Cc: www-talk@nxoc01.cern.ch, wais-talk@think.com In-Reply-To: Tim Berners-Lee's message of Thu, 26 Mar 92 15:25:12 GMT+0100 <9203261425.AA23337@ nxoc01.cern.ch > Subject: Openning the WAIS document-id syntax First, I'd like to point out the WAIS-FTP doesn't mean a client or server understands FTP protocol. It's simply a customized server that functions like FTP (but is read-only). It's mainly an experiment in modifying servers and providing services under WAIS. Date: Thu, 26 Mar 92 15:25:12 GMT+0100 From: timbl@nxoc01.cern.ch (Tim Berners-Lee) [...] The data model of WAIS (documents in databases) could be deconstrained to allow documents themselves to be or contain lists of documents, and for lists of documents to point to things other than documents in the same database. I take it you're suggesting a new TYPE for a document: Derived types? In a sense the catalog is one of these. This is the way the second part can work. Normally, a search returns a list of doc-ids, each one (basically) like /usr/local/lib/wais/mydatabase/fred/myfile.txt which is in fact a filename. Let me also point out that this is just the method used in the sample server. The CM server does not return DocID's that are derived from filenames. In fact, DocID's are "any"s, and that means they can have anything in them, so long as the server understands how to return a specified amount of data to a client when presented a DocID and a range. There's a load of other stuff in there which we can ignore for now. What a WAIS search needs to be able to do, when you are pointing to files, is to return a pointer to a file in FTP say. We do that in two steps. I don't agree. I think the server should do the retrieval. The client should not have to know anything about the REAL location of the document. More on that below. First, we recognise that that id is local to the conext of a wais server on host myhost and port myport. When the server returns that string, the client uses knowledge of the context in which it was quoted to exapnd that to wais://myhost.dom.net:myport/usr/local/lib/wais/mydatabase/fred/myfile.txt This is a refernece you can quote to anyone as it makes sense anywhere. No context. I called it a UDI but we'll have to change the name. Document Access Token maybe. It's like Brewster's proposal but extendable to other protocols. [Yes, WAIS is a good protocol but there are others. Including name servers and directories which will be needed for long-lived but movable documents.] This is a good idea, but I feel rather strongly that we should be very careful in overloading the protocol. Specifying a syntax for DocID's is one way of overloading the protocol. Standardizing types is another. Now suppose one day a server returns a doc-id INCLUDING the protocol, host, etc. For example, your WAIS FTP engine (like the ARCHIE WAIS) returns what are basically pointers to files. Just now, because of the constraints of the model, it has to return a part of a file within the database. Suppose we change that, so that in your case it just returns a doc-id which specifies anonymous ftp access, like: WAIS-FTP doesn't return pointers to remote files. It returns local DocIDs for use in retrieving a file local to the server. Archie WAIS (and ftpable-readmes) returns these pointers. That's a different story. Now for a small discussion of WAIS DocID's. So far WAIS DocID's have only a few fields: typedef struct DocID{ any* originalServer; any* originalDatabase; any* originalLocalID; any* distributorServer; any* distributorDatabase; any* distributorLocalID; long copyrightDisposition; } DocID; The part you refer to is just the LocalID part. If you look at some of the DocID's returned by the serial server, you'll see the other fields are filled in (though the Server fields don't contain much useful information - it's that part we were trying to standardize with the doc-id proposal). file://otherhost.com/pub/doc/mydoc.txt The client has a general retrieval engine which can accept doc-ids in many domains -- not just WAIS. That allows it to go out over a different protocol to retrieve the object. There are two ways to handle this, of course. Either the client or the server could do the retrieval. I believe the server should handle the protocol part (if the document is stored on some FTP server somewhere, the WAIS server can just fetch the file, and return it to the client). This reduces client complexity. I have no objection to specifying the protocol/server in the DocID (perhaps with another field), but we must standardize the meanings. This is the way WWW and Gopher work. They are open systems -- you can link into any other system within reason. That's why the fuss about universal document identifiers. Maybe the WAIS people would to incorporate them -- that is, just make sure that the normal WAIS server return things which are -- like the one above -- special cases of the more general syntax. I haven't had much comment from the WAIS side about the UDIs, but I'd like to have some. (file://info.cern.ch/pub/www/doc/udi1.ps was background for the IETF discussions.) We plan a small working group hacking out the details before an RFC is submitted. Come up with an RFC, and we'll try to abide by it. I'd like to caution you against overloaded strings. We've got enough of them already. For a start, I'd suggest we use the originalServer as the identifier for the HOST, and the originalDatabase can inform us of the protocol. - Jonny G