Message-Id: <9207160406.AA28878@pixel.convex.com> To: www-talk@nxoc01.cern.ch Subject: RE: Minutes of the "UDI" BOF at the 24th IETF Date: Wed, 15 Jul 92 23:06:17 CDT From: Dan Connolly <connolly@pixel.convex.com> <!DOCTYPE HTML SYSTEM> <H1>RE: <A HREF= "http://info.cern.ch/timbl/Public/USTrip1992/IETF-24/UDI_BOF_Minutes.html"> Minutes of the "UDI" BOF at the 24th IETF</A></H1> [The gopher folks are dicussing how to add the "reply" feature from USENET newsreaders to gopher clients. I wish www had that feature too. The "followup" feature is currently beyond the scope of WWW, but...] <p> [I'm beginning to wish www was a mail user agent and a gopher client and a news reader and a WAIS client all in one. Soon...] <p> [We need a DTD that models the USENET thread model, with posts, followups, quotes, points, counterpoints, flames, signatures, etc.<p> Right now, I have to modify a destination document (add an anchor) to link to an element of that document. We should look at what HyTime and the TEI use for references to SGML elements without IDs. I saw a HyTime description that said you could link to non-hyptime documents. Maybe we could use this to reference passages of news articles etc. ] <p> <H2>Quote</H2> The information one quoted in a reference to an object could comprise many things, among which were possible one unique name, (Unique Resource Number, URN was one acronym), and zero or more addresses (Uniform Resource Locators or URLs) which gave instructions for retrieving the object. <h2>Response</H2> Cool! The model for global hypertext that swims around in my mind includes references which consist of identifiers and addresses. On the subject of URNs, it sure would be nice to be able to verify that the data mentioned in the link and the data retrieved from the URL are the same. <p> For instance, I could reference an FTP file, and somebody could write over it. I might look like a fool arguing against old information. If I give a URN in the link, the browser could warn that it doesn't match the URN of the retrieved document. <h2>Quote</h2> NOT to be discussed were the differences between names and addresses, URN schemes (which are not yet well enough defined), the full set of information to be given in a reference, or IPv7. <p> <h2>response</h2> I think internet message ID's make great URN's. <h2>quote</h2> To be discussed were the overall string syntax, including allowed characters and escaping systems for unallowed characters, the order of components (little/big-endian), punctuation characters, the particular prefix to be used to identify each namespace. <h2>Response</h2> Cool. This makes URLs orthogonal to MIME external body part references. I believe where they are not orthogonal, they should coincide. <h2>quote</h2> It was pointed out that for WAIS one could imagine a separate name space for databases and for documents. If this was taken futher, a separate prefix would be used for each type of object. It was on balance agreed that this could go too far. One prefix should be used per protocol, but it should be made clear how to determine the type of an object from the URL. <h2>response</h2> Note how nicely all this coincides with MIME external body part constructs. Such a body part looks like, for example:<XMP> Content-Type: message/external-body; access-type=x-http; host="info.cern.ch"; path="/timbl/Public/USTrip1992/IETF-24/UDI_BOF_Minutes.html" Content-Description: Minutes of the "UDI" BOF at the 24th IETF Content-Type: text/x-html </XMP> <p> In general, a MIME external body part looks like:<XMP> Content-Type: message/external-body; access-type=_type_; /* local-file, anon-ftp, afs, or x-token */ _other_parameters_ /* path, host, name, database, etc. */ Content-ID: _message-id_ /* of external body part */ Content-Description: here's what you'll get! Content-Type: _base_/_subtype_ /* type of data in external body part */ ghost body goes here. This is _not_ the contents of the body part, but it is available to the user agent that's fetching the data. It could be used, for example, for the seed-words of a WAIS reference. </XMP> It looks like a URL is a condensed version of the MIME external body part headers. The URL scheme:____blah____ syntax maps nicely to MIME access-type=scheme; ___parameters for blah___. <h2>quote</h2> The class of object you get back should be predictable (--C Lynch). W3 has a real problem with that, since everything is a "document" and handled in a similar way. <h2>response</h2> I don't agree that "everything is a 'document'" to the W3 browser. The browser knows it's getting gopher directory info from gopher UDI's, for example. I think the type of data it returns can and should be classified by the MIME typing system, even if it does so implicitly. <h2>quote</h2> Should one use punctuation, or attribute-value pairs? Attribute value pairs get mispelt. (note x.400 vs.internet addresses)<p> It was decided to use a short string with punctuation rather than an attribute-value pair system.<p> <h2>response</h2> I have doubts about the ability to be able to encode all this information (scheme, host, path/selector-string, type, etc) in something akin to a phone number that can be written on one line of text with no spaces. I think that within each scheme, folks develop printable syntaxes for making references (ange-ftp, WAIS source files, etc.).<p> But the scope of URLs is so vast that I wonder if folks will form habits over this whole domain.<p> I advocate that the W3 format include, at least experimentally, an SGML element for each access type, with the URL pre-parsed into attribute-value pairs. The anchor element could become more complex, including sub-elements for URLs and URNs. Data type information could be included somewhere.<p> The HTTP access type doesn't require type information: format negociation is part of the HTTP protocol. But WAIS and Gopher references require these types, and it would be nice for FTP references (at least to choose between image and text transfers.)<p> I'll think it over and work on a DTD that uses these pre-parsed URLs. How do these examples look? <XMP> <A HREF= "http://info.cern.ch/timbl/Public/USTrip1992/IETF-24/UDI_BOF_Minutes.html"> Minutes of the "UDI" BOF at the 24th IETF< /A> becomes <A><HTTP host="info.cern.ch" path="/timbl/Public/USTrip1992/IETF-24/UDI_BOF_Minutes.html"> Minutes of the "UDI" BOF at the 24th IETF< /A> and <A NAME=gopher HREF=gopher://gopher.micro.umn.edu:70/11/Other%20Gopher%20and%20Information%20Servers> list of sites< /A> becomes <A NAME=gopher> <Gopher type="text/x-gopher-1" host="gopher.micro.umn.edu" port=70 selector="1/Other Gopher And Information Servers"> list of sites< /A> </XMP> The idea here is that we've got a parser already: the SGML parser. Why not use it to parse the various bits of data we need to reference data located elsewhere? <h2>quote</h2> A separate issue of whether human or only machine readable. Previously, included issue of printable. This is needed because don't have names now. Question arose of whether once these addresses exist will be replaceable with names - will be presented as new functionality, not replacing existing systems. Agreement on some way of specifying class of objects. <h2>response</h2> This reminds me of <A HREF="AUGMENT:132082,#11l"> Knowledge-Domain Interoperability & an Open Hyperdocument System </A> by Douglas C. Engelbart in which he gives requirements for his system. <p> One of them is: <H4>Hard-Copy Print Options to Show Addresses of Objects and Address Specification of Links</H4> <h5> ... so that, besides online workers being able to follow a link-citation path (manually, or via an automatic link jump), people working with associated hard copy can read and iterpret the link-citation, and follow the indicated path to the cited object in the designated hard-copy document.<p> <p> Also, suppose that a hard-copy worker wants to have a like to a given object established in the online file. By visual inspection of the hard copy, he should be able to determine a valid address path to that object and for instance hand-write an appropriate link specification for later online entry, or dicate it over a phone to a colleague.</H5> That document deserves a thorough reading by the whole comp.infosystems.* community. <h2>quote</h2> IT WAS AGREED that the context, or namespace, prefix be the first (leftmost) part of the URL, and be separated from the rest of the URL by a colon. <h2>response</h2> Has anybody given any thought to a syntax with implied schemes so that the ange-ftp style URLs and internet message ID URNs that are out there can be used?<p> If we reserved a character to _start_ UDIs, then we could try to infer the scheme of strings that don't start with that char. Let's take () for URL schemes and [] for URN schemes. <XMP> For example: host:path == (ANON-FTP)host:path path@host == (ANON-FTP)host:path <message-id@host> == [rfc-822]<message-id@host> </XMP> Well, I suppose this type of thing is really akin to the W3 local UDI scheme: it's application specific.<p> <h6>Postscript: This document was prepared using emacs with the help of Eric Naggum's <A HREF="file://ftp.ifi.uio.no/pub/SGML/elisp/sgml-mode.el"> sgml-mode</A>, and verified by sgmls-0.8 with the DTD in <A HREF="message-id:<9207160335.AA24812@pixel.convex.com>">html.dtd </a> . This is proof of concept that the W3 browser handles conforming SGML. I also wrote <a href="message-id:<9207160349.AA25229@pixel.convex.com>"> a short perl script</a> that will bring many existing HTML files into conformance. </h6>