Re: Historical - Re: Proposed IETF/W3C task force: "Resource meaning" Review of new HTTPbis text for 303 See Other from Mike Bergman on 2009-08-25 (www-tag@w3.org from August 2009)

From: Mike Bergman <mike@mkbergman.com>
Date: Mon, 24 Aug 2009 20:01:22 -0500
To: noah_mendelsohn@us.ibm.com
CC: Tim Berners-Lee <timbl@w3.org>, "Roy T. Fielding" <fielding@gbiv.com>, Julian Reschke <julian.reschke@gmx.de>, Larry Masinter <masinter@adobe.com>, Mark Nottingham <mnot@mnot.net>, Pat Hayes <phayes@ihmc.us>, W3C TAG <www-tag@w3.org>
Message-ID: <4A9337E2.70307@mkbergman.com>
Hi All,

I don't know whether they would accept the commission (as I have 
suggested before [1]), but I again suggest the TAG appoint Roy 
Fielding and Pat Hayes to work jointly to present to the TAG a 
resolution to these vexing terminology and semantics issues.

Further, if the TAG were to agree in advance to accept a 
consensus recommendation from them, I think that goodwill and 
intelligence will prevail. I, for one, would agree to the 
recommendation.

Thanks, Mike

[1] 
http://www.mkbergman.com/426/the-shaky-semantics-of-the-semantic-web/

noah_mendelsohn@us.ibm.com wrote:
> Tim Berners-Lee wrote:
> 
>> I would like to see what the documents all look like if edited to 
>> use the words Document and Thing, and eliminate Resource. That's my 
>> best bet as to two english words which mean as close as we can get 
>> to what we want.
> 
> Yes on "thing"; as you've heard me say from time to time, I continue to 
> have reservations about the word "document".  No doubt "document" seems 
> less intimidating than IR, and is often suggestive of what we mean. Still, 
> I think it's actually too narrow, or at least troublingly ambiguous.
> 
> Maybe I've hung out with the XML crowd to long, but one of the things that 
> I tend to think of as characteristic of "documents", as opposed to "data", 
> is that they tend to have ordered content.  The order of the paragraphs in 
> this email document is significant.
> 
> Now, let's say that I have a resource (thing) that consists of an 
> unordered set of stock quotes.  Each quote is a {company name, price} 
> pair, but there is no inherent or prefered order for the quotes.  As a 
> practical matter, any particular representation sent through HTTP will 
> likely have the quotes in one order or another, but that order is an 
> artifact of the representation technology, just like the angle brackets, 
> whitespace or other delimiters for the quotes.  I representation with the 
> order changed would be equally appropriate.
> 
> Question: is it OK to return a 200 for this bag of quotes?  I hope so.  Do 
> we call an unordered bag of quotes a document?  Well, we can, but I think 
> it's a stretch. 
> 
> I played some role in suggesting the term "Information Resource" to the 
> TAG in 2004.  I acknowledge and regret that few seem to be pleased with 
> it, but let me at least remind those who don't know how it came about.  I 
> wanted to find a term that more clearly covered cases like the one above 
> (and relational tables, trees, graphs, and other data-like abstractions). 
> It occurred to me that Claude Shannon, in his theory of Information, 
> seemed to deal with exactly the sorts of abstractions for which we wanted 
> to allow 200;  I.e., those that could be represented by a sequence of 
> bits, of agreed encoding.   Can you apply Shannon's theory (which is 
> really about error rates and reliablity) to attempts to transmit the text 
> of the Gettysburg address?  Yes, presuming sender and receiver can agree 
> on an encoding.  Can you apply Shannon's theory to my bag of stock quotes 
> or to the information filling the (unordered!) rows and columns of a 
> relational table?  Yes.  Can you apply it to attempts to somehow transmit 
> me, the three dimensional living TAG member with the unruly hair?  No. So, 
> it's just the distinction we want.
> 
> If everyone decides that on balance "document" is the lesser of the evils, 
> I suppose I could go along with it, but I don't think it's quite right. If 
> we use it, we should at least try to explain what's really covered and 
> what's not.  I still think that IR, in the sense intended, is closer to 
> what we really mean.  (If I have to return a 303 for a bag of stock 
> quotes, I'm going to be annoyed.) 
> 
> Noah
> 
> --------------------------------------
> Noah Mendelsohn 
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------
> 
> 
> 
> 
> 
> 
> 
> 
> Tim Berners-Lee <timbl@w3.org>
> Sent by: www-tag-request@w3.org
> 08/01/2009 10:14 PM
>  
>         To:     Pat Hayes <phayes@ihmc.us>
>         cc:     "Roy T. Fielding" <fielding@gbiv.com>, Larry Masinter 
> <masinter@adobe.com>, Julian Reschke <julian.reschke@gmx.de>, Mark 
> Nottingham <mnot@mnot.net>, W3C TAG <www-tag@w3.org>, (bcc: Noah 
> Mendelsohn/Cambridge/IBM)
>         Subject:        Historical - Re: Proposed IETF/W3C task force: 
> "Resource meaning" Review of new HTTPbis text for 303 See Other
> 
> 
> 
> On 2009-07 -20, at 16:27, Pat Hayes wrote:
> [...]
> 
> . But this thread started because HTTPbis explicitly disagrees with RFC 
> 3986 on what a resource is. Surely these various documents should at least 
> agree on their uses of the basic technical terminology.
> 
> I agree. 
> 
> Historically, URIs were used to point to thinks like web pages and files 
> and movies, on the web, useful documents, or "online resources" in the 
> sense of useful things out there. FTP. Gopher and HTTP sites served up 
> various types of online resources.  People got used to http://example.com/ 
> being a web page and http://example.com/#contact being an anchor within 
> it.
> 
> The Online Information community, into whose domain the web stuff was put 
> for standardization at the IETF, referred to these things like web pages 
> as resources, and changed the original "D" for "Document"  in "UDI" to 
> "R".
> Some felt that resource was more appropriate term, maybe because 
> "document" wasn't wide enough to include things like movies.
> 
> Now the URI spec actually allowed URIs for completely different things, 
> such as telephone end points, and wisely the URI spec does not make any 
> arbitrary constraint on what a resource should be, especially a resource 
> denoted by a URI in a new scheme to be invented.
> 
> Meanwhile, the HTTP spec was polished and elaborated basically as a 
> document delivery system, plus other methods for updating documents, plus 
> POST.  (POST started historically as a way of introducing a new web page y 
> posting it to a list, just as in NNTP.  It then almost immediately got 
> used as a catch-all extension method. I will ignore it in this overview).
> 
> There was no real definition of what a resource or document was -- maybe 
> because it seemed obvious. The HTTP spec did not even specify whether the 
> URI denoted a person or a document about them, it just explained that the 
> thing returned representation of the resource.
> 
> Roy's REST work then came along to formalize HTTP as REST and declared 
> that a resource was a time-varying mapping between URI and representation. 
> That was good enough for HTTP. It didn't have enough for the AWWW, when it 
> came along, to be able to describe how the web worked.
> 
> In fact, the AWWW document, to explain how to use the web properly, had to 
> add in a bunch of stuff about the social expectations -- things like, yes, 
> the mapping from URI to representation is a function of time, but not just 
> any old one -- a random function is not typically very useful. There are 
> expectations about it can change with time.  Persistence, consistency, 
> with various common patterns which allow the web to be a useful medium. 
> The AWWW decided to use the term "Information Resource" for a thing like a 
> web page which contains information, and "Resource" for any old thing at 
> all.
> 
> So HTTP and the REST work of was done very much in this space of document 
> delivery, editing and update.  There was no philosophical need to talk 
> about what he URI denoted (the person, the web page about the person) 
> until RDF came along, when there was an immediate need.
> 
> When RDF was first developed, it was motivated by the need for data about 
> resources very much in the online information sense: data about documents, 
> or 'metadata'.  In fact it was designed to be able to describe anything, 
> but many early users of RDF referred to it as metadata technology.  RDF 
> used the word "resource" rather awkwardly in fact as it turned out.  In 
> the beginning, many of the things being described were documents, and so 
> the online information meaning of resource made sense. But in fact in RDF 
> the resource was allowed to be anything at all. A class, rdf:Resource even 
> used the term as the universal class of all things.  A little later, the 
> Web Ontology Language decided to use Thing for that. 
> 
> RDF came along in what I think was a neat way.  It used completely 
> existing web protocol extension devices to introduce a new system which 
> was fundamentally different from the old HTTP+HTML one.  The HTML web was 
> a hypertext model, which pages and anchors. The RDF model was a knowledge 
> representation one of arbitrary things.  It did this by using the fact 
> that a new language can define whatever it likes as what a local 
> identifier denotes.  A graphic language might use local identifier to 
> denote lines and points. HTML used local identifiers to identify hypertext 
> anchors.  RDF used them to identify arbitrary concepts, people, whatever.
> 
> The web architecture gave all these languages a common way of building a 
> global identifier for the thing denoted by a local identifier in a given 
> document.   The semantics of the hash sign are defined web-wide to mean 
> that "a#b" can be used to denote whatever is denoted by "b" in the 
> document denoted by "a".
> 
> Worked a treat.  At the beginning of the century, people played around and 
> gave all kinds of things URIs like "http://example.com/foo.rdf#color". 
> Some of us did lots of work and made all kinds of systems which exchanged 
> and integrated data in this way.
> 
> Two snags occurred, as the years passed.  One was that a bunch of RDF 
> users got the fact that it was good to use HTTP URIs, but didn't get the 
> fact that you should put the foo.rdf online so that people can look up 
> what #color means in it.  And as they didn't do that, they didn't actually 
> bother with the "#" at all.  The second fly in the ointment was that some 
> people wanting to use RDF for large systems found that they didn't want to 
> use the "#". This was sometimes because the number of things defined in 
> the same file was too low (like 1) or too large (like a million) and it 
> was difficult to divide up the information into middle-sized chunks. Or 
> they just didn't like the "#" because it looks weird. But for one reason 
> or another people demanded the right to be able to use 
> http://example.net/people/Pat to denote Pat rather than a web page about 
> Pat. 
> 
> This potentially led to huge failures in the whole RDF world, with systems 
> already built which just used   "http://example.net/people/Pat" to 
> identify the document whether you like it or not.
> I among others pushed back against using non-hash URIs for arbitrary 
> things his but eventually gave in.
> 
> So in response to this, the HTTP protocol was, in fact, changed.
> 
> The spec wasn't changed.  The spec editors were not brought on board to 
> the new model.  The spec was interpreted.  The TAG negotiated in a way a 
> truce between the existing HTTP spec, RDF systems, and people who wanted 
> to use HTTP URIs without "#" to identify people.  That truce was 
> HTTPRange-14, which said that yoiu don't a priory know that a hashless 
> HTTP URI denoted a document, but if the server responded with a 200 then 
> you did, and you had a representation of the document.   If you did a get 
> on one of these new URIs which identified things were not documents 
> (people, RDF properties, classes, etc) them the server must not return 
> 200, it can return 303 pointing to a document which explains more.
> 
> So the HTTP protocol was, effectively,  changed.  The HTTP protocol as 
> extended now allows HTTP to be used not only for Documents but for 
> arbitrary Things.  It extends the set of things which you can ask a web 
> server about from documents to anything.  It isn't a very bad design, nor 
> very beautiful.  Other designs would have worked, but that one was the 
> only one which didn't have major problems for some community.  It could be 
> extended, but basically it works. It would be very expensive to reverse it 
> in terms of systems which have been deployed.
> 
> It is also very expensive to go on debating it as though it is an open 
> issue. It is reasonable to try to make the documents more consistent. 
> 
> Anyway, that is a simplified version of the history of all this as I saw 
> it. 
> 
> I would like to see what the documents all look like if edited to use the 
> words Document and Thing, and eliminate Resource. That's my best bet as to 
> two english words which mean as close as we can get to what we want. Note 
> however that the web is a new system, a design in which new concepts are 
> created, so we can't expect english words to exist to capture exactly the 
> concepts. So we take those nearby and abuse them as little as we can as 
> far as we can tell at the time, and then write them in initial caps to 
> recognize that that is what we have done.
> 
> Tim 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 

-- 
__________________________________________

Michael K. Bergman
CEO  Structured Dynamics LLC
319.621.5225
skype:michaelkbergman
http://structureddynamics.com
http://mkbergman.com
http://www.linkedin.com/in/mkbergman
__________________________________________
Received on Tuesday, 25 August 2009 01:02:11 UTC