Re: Historical - Re: Proposed IETF/W3C task force: "Resource meaning" Review of new HTTPbis text for 303 See Other from Ray Denenberg, Library of Congress on 2009-08-25 (www-tag@w3.org from August 2009)

From: Ray Denenberg, Library of Congress <rden@loc.gov>
Date: Tue, 25 Aug 2009 10:17:42 -0400
To: "W3C TAG" <www-tag@w3.org>
Message-ID: <017901ca258e$cf1082f0$18af938c@lib.loc.gov>
I try to follow this as closely as I can, and I try to stay out of the 
discussion as much as possible.  But I'd like to weigh-in on "document". I 
don't like it.

It was once explained to me that "document" applies to a resource  that is 
relatively stable - relative for example to one which is updated regularly. 
Thus for example http://www.loc.gov/  (Library of Congress home page)  is a 
document but http://weather.yahoo.com/forecast/USDC0001.html  (DC weather) 
is not.  That's not to say that the LC home page doesn't change, but it is 
*relatively* stable.   (On Noah's  stock quote example, a stock quote web 
page would not be a document by this quasi-definition. A "bag of stock 
quotes" would.)

This explanation was given at the very first Dublin Core meeting (Dublin 
Ohio, March 1995), and the framers themselves were uncomfortable with 
"document" so they crafted the term "Document Like Object", DLO,  which was 
popular for awhile but ultimately abandoned, because nobody could really 
figure out what it was, just like nobody really knows what a document is. 
Now you may say that "document" in the current context isn't intended to 
mean this at all, but to many of us it will always have that connotation.

Is "document" being tossed around because of dissatisfaction with 
"information resource"?  On one hand there has been a failure to define 
"information resource" to everyone's satisfaction, but on the other hand 
there is still a desire to definitionally represent the difference between 
information resource and  non-information resource.  Do people think there 
will be more success if we start over and use "document" instead? 
Personally I'd stick with  "information resource".

--Ray

----- Original Message ----- 
From: <noah_mendelsohn@us.ibm.com>
To: "Tim Berners-Lee" <timbl@w3.org>
Cc: "Roy T. Fielding" <fielding@gbiv.com>; "Julian Reschke" 
<julian.reschke@gmx.de>; "Larry Masinter" <masinter@adobe.com>; "Mark 
Nottingham" <mnot@mnot.net>; "Pat Hayes" <phayes@ihmc.us>; "W3C TAG" 
<www-tag@w3.org>
Sent: Monday, August 24, 2009 8:24 PM
Subject: Re: Historical - Re: Proposed IETF/W3C task force: "Resource 
meaning" Review of new HTTPbis text for 303 See Other


> Tim Berners-Lee wrote:
>
>> I would like to see what the documents all look like if edited to
>> use the words Document and Thing, and eliminate Resource. That's my
>> best bet as to two english words which mean as close as we can get
>> to what we want.
>
> Yes on "thing"; as you've heard me say from time to time, I continue to
> have reservations about the word "document".  No doubt "document" seems
> less intimidating than IR, and is often suggestive of what we mean. Still,
> I think it's actually too narrow, or at least troublingly ambiguous.
>
> Maybe I've hung out with the XML crowd to long, but one of the things that
> I tend to think of as characteristic of "documents", as opposed to "data",
> is that they tend to have ordered content.  The order of the paragraphs in
> this email document is significant.
>
> Now, let's say that I have a resource (thing) that consists of an
> unordered set of stock quotes.  Each quote is a {company name, price}
> pair, but there is no inherent or prefered order for the quotes.  As a
> practical matter, any particular representation sent through HTTP will
> likely have the quotes in one order or another, but that order is an
> artifact of the representation technology, just like the angle brackets,
> whitespace or other delimiters for the quotes.  I representation with the
> order changed would be equally appropriate.
>
> Question: is it OK to return a 200 for this bag of quotes?  I hope so.  Do
> we call an unordered bag of quotes a document?  Well, we can, but I think
> it's a stretch.
>
> I played some role in suggesting the term "Information Resource" to the
> TAG in 2004.  I acknowledge and regret that few seem to be pleased with
> it, but let me at least remind those who don't know how it came about.  I
> wanted to find a term that more clearly covered cases like the one above
> (and relational tables, trees, graphs, and other data-like abstractions).
> It occurred to me that Claude Shannon, in his theory of Information,
> seemed to deal with exactly the sorts of abstractions for which we wanted
> to allow 200;  I.e., those that could be represented by a sequence of
> bits, of agreed encoding.   Can you apply Shannon's theory (which is
> really about error rates and reliablity) to attempts to transmit the text
> of the Gettysburg address?  Yes, presuming sender and receiver can agree
> on an encoding.  Can you apply Shannon's theory to my bag of stock quotes
> or to the information filling the (unordered!) rows and columns of a
> relational table?  Yes.  Can you apply it to attempts to somehow transmit
> me, the three dimensional living TAG member with the unruly hair?  No. So,
> it's just the distinction we want.
>
> If everyone decides that on balance "document" is the lesser of the evils,
> I suppose I could go along with it, but I don't think it's quite right. If
> we use it, we should at least try to explain what's really covered and
> what's not.  I still think that IR, in the sense intended, is closer to
> what we really mean.  (If I have to return a 303 for a bag of stock
> quotes, I'm going to be annoyed.)
>
> Noah
>
> --------------------------------------
> Noah Mendelsohn
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------
>
>
>
>
>
>
>
Received on Tuesday, 25 August 2009 14:18:21 UTC