URC Davnpt; re Dan's brain dump

Terry Allen (terry@ora.com)
Sun, 29 Jan 1995 11:36:58 PST

Message-Id: <199501291936.LAA16150@rock>
From: Terry Allen <terry@ora.com>
Date: Sun, 29 Jan 1995 11:36:58 PST
To: uri@bunyip.com, davenport@ora.com
Subject: URC Davnpt; re Dan's brain dump

Dan's helpful post is both useful in itself and helps define some of the 
issues.  I want first to comment on the "brain dump" section
and then, in a second post, respond to the specifics of the
proposal in Dan's second section. 

| Date: Fri, 27 Jan 95 19:04:04 EST
| Message-Id: <9501272343.AA22920@ulua.hal.com>
| From: "Daniel W. Connolly" <connolly@hal.com>
| To: Multiple recipients of list <html-wg@oclc.org>
| Subject: Redundancy in links, Davenport Prososal [long]

| I copied all these lists becaue I think there may be interested folks
| on all these lists. I suggest follow-ups be sent only to
| uri@bunyip.com and davenport@ora.com.]

Yes, please.
| In message <199501271917.LAA24883@rock>, Terry Allen writes:
| >Dan says
| >>For example, if there's a postscript file on an FTP server out there
| >called "report_127," you effectively can't link to it given today's
| >web.
| >But doesn't that mean simply that not enough info is being sent
| >about the file by the server, or that the client isn't smart enough?
| >Putting a content-type att on <A> seems like a fragile solution
| >to the problem, as it shifts responsibility to the author of
| >the doc, who is in most cases just a poor dumb human.
| Yes, it's fragile, but it's better than completely broken.
| This is _distributed_ hypertext. It spans domains of authority. As an
| author, I have authority over the info I put in the link, but I may
| not have the authority to change the filename on the server.  So I'm
| stuck.

I would much rather have the client deal with the situation
than the human author, who, after all, has actually pointed
at the right thing.  When you load a file in some foreign
format into Word for Windows (and does someone know of Word
for DOS that will run on a 486?) the program checks out
the file, guesses at its format, and offers you a bunch of
options for conversion.  Except for the conversion step,
there's no reason Web clients shouldn't do the same thing.


| >From the evidence that I have studied, the way to make links more
| reliable is not to deploy some new centralized namespace (ala URNs
| with publisher id's), but to put more redundant info in links.
| Rather than looking at the web as documents addressed by an
| identifier, I think we should look at it as a great big
| content-addressable-memory.  "Give me the document written by Fred in
| 1992 whose title is 'authentication in distributed systems'."
| I think the same sort of thing that makes for a high-quality citation
| in written materials will make for a reliable link in a distributed
| hypermedia system. A robust _link_ should look like a BibTex entry
| (MARC record, etc.)

Ah, yes, a link that is expected to be robust might well make
reference to an entry in its document's bibliography.  Writers
of long documented papers do well to construct the bibliography
and full footnotes as they go along, rather than having to
scurry around the library getting all that info at the end of
the job (he says priggishly); similarly, if you really want
to have a robust reference to something on the Net you'd do
well to collect its URC when you first link to it, for later
reference.  But I wouldn't want to put the info directly in
the link.  And at the other end I think there has to be some
index to the content-addressable-memory space, of which URNs
are only a part.

| Given a system like harvest[2], 

| So if I as the link author know more than the reader's client can get
| from the FTP server, I should be _able_ to contribute the knowledge
| that I have. Making all the authors put content type info in their
| links is the the wrong answer; the optimal solution is for the
| provider to adapt to the .ps convention. But the link author should
| be able to add value and quality despite the poor efforts of the
| FTP server maintainer.

I think the link author shouldn't have to add that info no matter
what.  If clients can't handle FTPable .ps files without the name
extension, something's broken between the FTP server and the client.
Fixing it in the markup of the document is patching the wrong
| "But the link author could just copy that file and put a .ps extension
| on his own machine," you might reply. This doesn't allow for the case

No, I thought of that (and another strategy of mapping locally 
a filename.ps to the remote filename) and rejected both for 
the reason you now give:

| when the document in question changes daily, and it doesn't provide an
| audit trail, and it violates my #1 engineering principal: never
| maintain the same information in more than one place.
Exactly.  The information about the content type of the file
should be maintained by the server, but is also inherent in
the object.  If my client can't get it from the server it can do 
some minor work to deduce that information.  But the link itself is 
about the worst place to put that information from the standpoint of 
human engineering:  entry of the information is prone to error,
the information can't be validated by parsing the document,
and the format of the target may change unbeknownst to the

| The URN model of publisher ID/local-identifier may be sufficient for
| the applications of moving the traditional publishing model onto the
| web. But that is only one application of the technology that it takes
| to achieve high quality links. Another application may have some other
| idea of what the "critical meta-information" is. For example, for bulk
| file distribution (ala archie/ftp), the MD5 is critical.
But you wouldn't suggest adding an attribute for HTML to allow 
putting MD5 info in the link, would you?  (This thread was originally
about HTML markup design.)  That info could easily go in a
bibliographic entry pointed to by the link, to achieve the robustness
you rightly desire.

| OK... so... now that I've a brian dump

On to the next post ...

Terry Allen  (terry@ora.com)   O'Reilly & Associates, Inc.
Editor, Digital Media Group    101 Morris St.
			       Sebastopol, Calif., 95472
monthly column at:  http://www.ora.com/gnn/meta/imedia/webworks/allen/

A Davenport Group sponsor.  For information on the Davenport 
  Group see ftp://ftp.ora.com/pub/davenport/README.html
	or  http://www.ora.com/davenport/README.html