Re: Minutes of the "UDI" BOF at the 24th IETF

Dan Connolly (connolly@pixel.convex.com)
Wed, 15 Jul 92 23:06:17 CDT


Message-Id: <9207160406.AA28878@pixel.convex.com>
To: www-talk@nxoc01.cern.ch
Subject: RE: Minutes of the "UDI" BOF at the 24th IETF
Date: Wed, 15 Jul 92 23:06:17 CDT
From: Dan Connolly <connolly@pixel.convex.com>

<!DOCTYPE HTML SYSTEM>
<H1>RE: 
<A HREF=
"http://info.cern.ch/timbl/Public/USTrip1992/IETF-24/UDI_BOF_Minutes.html">
Minutes of the "UDI" BOF at the 24th IETF</A></H1>

[The gopher folks are dicussing how to add the "reply"
feature from USENET newsreaders to gopher clients.
I wish www had that feature too. The "followup" feature
is currently beyond the scope of WWW, but...]
<p>

[I'm beginning to wish www was a mail user agent and
a gopher client and a news reader and a WAIS client
all in one. Soon...]
<p>

[We need a DTD that models the USENET thread model,
with posts, followups, quotes, points, counterpoints,
flames, signatures, etc.<p>

Right now, I have to modify a destination document (add an anchor) to
link to an element of that document. We should look at what HyTime and
the TEI use for references to SGML elements without IDs. I saw a
HyTime description that said you could link to non-hyptime documents.
Maybe we could use this to reference passages of news articles etc.  ]

<p>

<H2>Quote</H2>

The information one quoted in a reference to an object could comprise
many things, among which were possible one unique name, (Unique
Resource Number, URN was one acronym), and zero or more addresses
(Uniform Resource Locators or URLs) which gave instructions for
retrieving the object.

<h2>Response</H2>

Cool! The model for global hypertext that swims around in my mind
includes references which consist of identifiers and addresses. On the
subject of URNs, it sure would be nice to be able to verify that the
data mentioned in the link and the data retrieved from the URL are the
same.
<p>

For instance, I could reference an FTP file, and somebody could
write over it. I might look like a fool arguing against old
information. If I give a URN in the link, the browser could warn
that it doesn't match the URN of the retrieved document.

<h2>Quote</h2>

NOT to be discussed were the differences between names and addresses,
URN schemes (which are not yet well enough defined), the full set of
information to be given in a reference, or IPv7.
<p>

<h2>response</h2>

I think internet message ID's make great URN's.

<h2>quote</h2>

To be discussed were the overall string syntax, including allowed
characters and escaping systems for unallowed characters, the order of
components (little/big-endian), punctuation characters, the particular
prefix to be used to identify each namespace.

<h2>Response</h2>

Cool. This makes URLs orthogonal to MIME external body part
references. I believe where they are not orthogonal, they should
coincide.

<h2>quote</h2>

It was pointed out that for WAIS one could imagine a separate name
space for databases and for documents. If this was taken futher, a
separate prefix would be used for each type of object. It was on
balance agreed that this could go too far. One prefix should be used
per protocol, but it should be made clear how to determine the type of
an object from the URL.

<h2>response</h2>

Note how nicely all this coincides with MIME external body part
constructs. Such a body part looks like, for example:<XMP>
Content-Type: message/external-body;
	access-type=x-http;
	host="info.cern.ch";
	path="/timbl/Public/USTrip1992/IETF-24/UDI_BOF_Minutes.html"

Content-Description: Minutes of the "UDI" BOF at the 24th IETF
Content-Type: text/x-html

</XMP>
<p>

In general, a MIME external body part looks like:<XMP>
Content-Type: message/external-body;
	access-type=_type_; /* local-file, anon-ftp, afs, or x-token */
	_other_parameters_ /* path, host, name, database, etc. */

Content-ID: _message-id_ /* of external body part */
Content-Description: here's what you'll get!
Content-Type: _base_/_subtype_ /* type of data in external body part */

ghost body goes here. This is _not_ the contents of the body part,
but it is available to the user agent that's fetching the data.
It could be used, for example, for the seed-words of a WAIS reference.
</XMP>

It looks like a URL is a condensed version of the MIME external body
part headers. The URL scheme:____blah____ syntax maps nicely to
MIME access-type=scheme; ___parameters for blah___.

<h2>quote</h2>

The class of object you get back should be predictable (--C Lynch).
W3 has a real problem with that, since everything is a "document" and
handled in a similar way.

<h2>response</h2>

I don't agree that "everything is a 'document'" to the W3 browser. The
browser knows it's getting gopher directory info from gopher UDI's,
for example. I think the type of data it returns can and should be
classified by the MIME typing system, even if it does so implicitly.

<h2>quote</h2>

Should one use punctuation, or attribute-value pairs? Attribute value
pairs get mispelt. (note x.400 vs.internet addresses)<p>
   
It was decided to use a short string with punctuation rather than an
attribute-value pair system.<p>

<h2>response</h2>
I have doubts about the ability to be able to encode all this
information (scheme, host, path/selector-string, type, etc) in
something akin to a phone number that can be written on one line of
text with no spaces. I think that within each scheme, folks develop
printable syntaxes for making references (ange-ftp, WAIS source files,
etc.).<p>

But the scope of URLs is so vast that I wonder if folks will form
habits over this whole domain.<p>

I advocate that the W3 format include, at least experimentally, an
SGML element for each access type, with the URL pre-parsed into
attribute-value pairs. The anchor element could become more complex,
including sub-elements for URLs and URNs. Data type information could
be included somewhere.<p>

The HTTP access type doesn't require type information: format
negociation is part of the HTTP protocol. But WAIS and Gopher
references require these types, and it would be nice for FTP
references (at least to choose between image and text transfers.)<p>

I'll think it over and work on a DTD that uses these pre-parsed URLs.

How do these examples look?
<XMP>
<A HREF=
"http://info.cern.ch/timbl/Public/USTrip1992/IETF-24/UDI_BOF_Minutes.html">
Minutes of the "UDI" BOF at the 24th IETF< /A>

becomes

<A><HTTP host="info.cern.ch"
 path="/timbl/Public/USTrip1992/IETF-24/UDI_BOF_Minutes.html">
Minutes of the "UDI" BOF at the 24th IETF< /A>

and

<A NAME=gopher
HREF=gopher://gopher.micro.umn.edu:70/11/Other%20Gopher%20and%20Information%20Servers>
list of sites< /A>

becomes

<A NAME=gopher>
<Gopher type="text/x-gopher-1"
        host="gopher.micro.umn.edu" port=70
        selector="1/Other Gopher And Information Servers">
list of sites< /A>
</XMP>

The idea here is that we've got a parser already: the SGML parser.
Why not use it to parse the various bits of data we need to reference
data located elsewhere?

<h2>quote</h2>

A separate issue of whether human or only machine readable.
Previously, included issue of printable.  This is needed because don't
have names now.  Question arose of whether once these addresses exist
will be replaceable with names - will be presented as new
functionality, not replacing existing systems. Agreement on some way
of specifying class of objects.

<h2>response</h2>

This reminds me of <A HREF="AUGMENT:132082,#11l"> Knowledge-Domain
Interoperability & an Open Hyperdocument System
</A>
 by Douglas C. Engelbart in which he gives requirements for his
system.
<p>

One of them is:

<H4>Hard-Copy Print Options to Show Addresses of Objects and Address
Specification of Links</H4>

<h5> ... so that, besides online workers being
able to follow a link-citation path (manually, or via an automatic
link jump), people working with associated hard copy can read and
iterpret the link-citation, and follow the indicated path to the cited
object in the designated hard-copy document.<p>
<p>

Also, suppose that a hard-copy worker wants to have a like to a given
object established in the online file. By visual inspection of the
hard copy, he should be able to determine a valid address path to that
object and for instance hand-write an appropriate link specification
for later online entry, or dicate it over a phone to a colleague.</H5>

That document deserves a thorough reading by the whole
comp.infosystems.* community.

<h2>quote</h2>

IT WAS AGREED that the context, or namespace, prefix be the first
(leftmost) part of the URL, and be separated from the rest of the URL
by a colon.

<h2>response</h2>

Has anybody given any thought to a syntax with implied schemes so that
the ange-ftp style URLs and internet message ID URNs that are out
there can be used?<p>

If we reserved a character to _start_ UDIs, then we could try to infer
the scheme of strings that don't start with that char. Let's take
() for URL schemes and [] for URN schemes.

<XMP>
For example: host:path == (ANON-FTP)host:path
             path@host == (ANON-FTP)host:path
             <message-id@host> == [rfc-822]<message-id@host>
</XMP>

Well, I suppose this type of thing is really akin to the W3 local
UDI scheme: it's application specific.<p>

<h6>Postscript: This document was prepared using emacs with the help
of Eric Naggum's
<A HREF="file://ftp.ifi.uio.no/pub/SGML/elisp/sgml-mode.el">
sgml-mode</A>, and verified by sgmls-0.8 with the DTD in <A
HREF="message-id:<9207160335.AA24812@pixel.convex.com>">html.dtd
</a>
. This is proof of concept that the W3 browser handles conforming
SGML. I also wrote
<a
href="message-id:<9207160349.AA25229@pixel.convex.com>"> a short perl
script</a> that will bring many existing HTML files into conformance.

</h6>