- From: Mike Bergman <mike@mkbergman.com>
- Date: Mon, 24 Aug 2009 20:01:22 -0500
- To: noah_mendelsohn@us.ibm.com
- CC: Tim Berners-Lee <timbl@w3.org>, "Roy T. Fielding" <fielding@gbiv.com>, Julian Reschke <julian.reschke@gmx.de>, Larry Masinter <masinter@adobe.com>, Mark Nottingham <mnot@mnot.net>, Pat Hayes <phayes@ihmc.us>, W3C TAG <www-tag@w3.org>
Hi All, I don't know whether they would accept the commission (as I have suggested before [1]), but I again suggest the TAG appoint Roy Fielding and Pat Hayes to work jointly to present to the TAG a resolution to these vexing terminology and semantics issues. Further, if the TAG were to agree in advance to accept a consensus recommendation from them, I think that goodwill and intelligence will prevail. I, for one, would agree to the recommendation. Thanks, Mike [1] http://www.mkbergman.com/426/the-shaky-semantics-of-the-semantic-web/ noah_mendelsohn@us.ibm.com wrote: > Tim Berners-Lee wrote: > >> I would like to see what the documents all look like if edited to >> use the words Document and Thing, and eliminate Resource. That's my >> best bet as to two english words which mean as close as we can get >> to what we want. > > Yes on "thing"; as you've heard me say from time to time, I continue to > have reservations about the word "document". No doubt "document" seems > less intimidating than IR, and is often suggestive of what we mean. Still, > I think it's actually too narrow, or at least troublingly ambiguous. > > Maybe I've hung out with the XML crowd to long, but one of the things that > I tend to think of as characteristic of "documents", as opposed to "data", > is that they tend to have ordered content. The order of the paragraphs in > this email document is significant. > > Now, let's say that I have a resource (thing) that consists of an > unordered set of stock quotes. Each quote is a {company name, price} > pair, but there is no inherent or prefered order for the quotes. As a > practical matter, any particular representation sent through HTTP will > likely have the quotes in one order or another, but that order is an > artifact of the representation technology, just like the angle brackets, > whitespace or other delimiters for the quotes. I representation with the > order changed would be equally appropriate. > > Question: is it OK to return a 200 for this bag of quotes? I hope so. Do > we call an unordered bag of quotes a document? Well, we can, but I think > it's a stretch. > > I played some role in suggesting the term "Information Resource" to the > TAG in 2004. I acknowledge and regret that few seem to be pleased with > it, but let me at least remind those who don't know how it came about. I > wanted to find a term that more clearly covered cases like the one above > (and relational tables, trees, graphs, and other data-like abstractions). > It occurred to me that Claude Shannon, in his theory of Information, > seemed to deal with exactly the sorts of abstractions for which we wanted > to allow 200; I.e., those that could be represented by a sequence of > bits, of agreed encoding. Can you apply Shannon's theory (which is > really about error rates and reliablity) to attempts to transmit the text > of the Gettysburg address? Yes, presuming sender and receiver can agree > on an encoding. Can you apply Shannon's theory to my bag of stock quotes > or to the information filling the (unordered!) rows and columns of a > relational table? Yes. Can you apply it to attempts to somehow transmit > me, the three dimensional living TAG member with the unruly hair? No. So, > it's just the distinction we want. > > If everyone decides that on balance "document" is the lesser of the evils, > I suppose I could go along with it, but I don't think it's quite right. If > we use it, we should at least try to explain what's really covered and > what's not. I still think that IR, in the sense intended, is closer to > what we really mean. (If I have to return a 303 for a bag of stock > quotes, I'm going to be annoyed.) > > Noah > > -------------------------------------- > Noah Mendelsohn > IBM Corporation > One Rogers Street > Cambridge, MA 02142 > 1-617-693-4036 > -------------------------------------- > > > > > > > > > Tim Berners-Lee <timbl@w3.org> > Sent by: www-tag-request@w3.org > 08/01/2009 10:14 PM > > To: Pat Hayes <phayes@ihmc.us> > cc: "Roy T. Fielding" <fielding@gbiv.com>, Larry Masinter > <masinter@adobe.com>, Julian Reschke <julian.reschke@gmx.de>, Mark > Nottingham <mnot@mnot.net>, W3C TAG <www-tag@w3.org>, (bcc: Noah > Mendelsohn/Cambridge/IBM) > Subject: Historical - Re: Proposed IETF/W3C task force: > "Resource meaning" Review of new HTTPbis text for 303 See Other > > > > On 2009-07 -20, at 16:27, Pat Hayes wrote: > [...] > > . But this thread started because HTTPbis explicitly disagrees with RFC > 3986 on what a resource is. Surely these various documents should at least > agree on their uses of the basic technical terminology. > > I agree. > > Historically, URIs were used to point to thinks like web pages and files > and movies, on the web, useful documents, or "online resources" in the > sense of useful things out there. FTP. Gopher and HTTP sites served up > various types of online resources. People got used to http://example.com/ > being a web page and http://example.com/#contact being an anchor within > it. > > The Online Information community, into whose domain the web stuff was put > for standardization at the IETF, referred to these things like web pages > as resources, and changed the original "D" for "Document" in "UDI" to > "R". > Some felt that resource was more appropriate term, maybe because > "document" wasn't wide enough to include things like movies. > > Now the URI spec actually allowed URIs for completely different things, > such as telephone end points, and wisely the URI spec does not make any > arbitrary constraint on what a resource should be, especially a resource > denoted by a URI in a new scheme to be invented. > > Meanwhile, the HTTP spec was polished and elaborated basically as a > document delivery system, plus other methods for updating documents, plus > POST. (POST started historically as a way of introducing a new web page y > posting it to a list, just as in NNTP. It then almost immediately got > used as a catch-all extension method. I will ignore it in this overview). > > There was no real definition of what a resource or document was -- maybe > because it seemed obvious. The HTTP spec did not even specify whether the > URI denoted a person or a document about them, it just explained that the > thing returned representation of the resource. > > Roy's REST work then came along to formalize HTTP as REST and declared > that a resource was a time-varying mapping between URI and representation. > That was good enough for HTTP. It didn't have enough for the AWWW, when it > came along, to be able to describe how the web worked. > > In fact, the AWWW document, to explain how to use the web properly, had to > add in a bunch of stuff about the social expectations -- things like, yes, > the mapping from URI to representation is a function of time, but not just > any old one -- a random function is not typically very useful. There are > expectations about it can change with time. Persistence, consistency, > with various common patterns which allow the web to be a useful medium. > The AWWW decided to use the term "Information Resource" for a thing like a > web page which contains information, and "Resource" for any old thing at > all. > > So HTTP and the REST work of was done very much in this space of document > delivery, editing and update. There was no philosophical need to talk > about what he URI denoted (the person, the web page about the person) > until RDF came along, when there was an immediate need. > > When RDF was first developed, it was motivated by the need for data about > resources very much in the online information sense: data about documents, > or 'metadata'. In fact it was designed to be able to describe anything, > but many early users of RDF referred to it as metadata technology. RDF > used the word "resource" rather awkwardly in fact as it turned out. In > the beginning, many of the things being described were documents, and so > the online information meaning of resource made sense. But in fact in RDF > the resource was allowed to be anything at all. A class, rdf:Resource even > used the term as the universal class of all things. A little later, the > Web Ontology Language decided to use Thing for that. > > RDF came along in what I think was a neat way. It used completely > existing web protocol extension devices to introduce a new system which > was fundamentally different from the old HTTP+HTML one. The HTML web was > a hypertext model, which pages and anchors. The RDF model was a knowledge > representation one of arbitrary things. It did this by using the fact > that a new language can define whatever it likes as what a local > identifier denotes. A graphic language might use local identifier to > denote lines and points. HTML used local identifiers to identify hypertext > anchors. RDF used them to identify arbitrary concepts, people, whatever. > > The web architecture gave all these languages a common way of building a > global identifier for the thing denoted by a local identifier in a given > document. The semantics of the hash sign are defined web-wide to mean > that "a#b" can be used to denote whatever is denoted by "b" in the > document denoted by "a". > > Worked a treat. At the beginning of the century, people played around and > gave all kinds of things URIs like "http://example.com/foo.rdf#color". > Some of us did lots of work and made all kinds of systems which exchanged > and integrated data in this way. > > Two snags occurred, as the years passed. One was that a bunch of RDF > users got the fact that it was good to use HTTP URIs, but didn't get the > fact that you should put the foo.rdf online so that people can look up > what #color means in it. And as they didn't do that, they didn't actually > bother with the "#" at all. The second fly in the ointment was that some > people wanting to use RDF for large systems found that they didn't want to > use the "#". This was sometimes because the number of things defined in > the same file was too low (like 1) or too large (like a million) and it > was difficult to divide up the information into middle-sized chunks. Or > they just didn't like the "#" because it looks weird. But for one reason > or another people demanded the right to be able to use > http://example.net/people/Pat to denote Pat rather than a web page about > Pat. > > This potentially led to huge failures in the whole RDF world, with systems > already built which just used "http://example.net/people/Pat" to > identify the document whether you like it or not. > I among others pushed back against using non-hash URIs for arbitrary > things his but eventually gave in. > > So in response to this, the HTTP protocol was, in fact, changed. > > The spec wasn't changed. The spec editors were not brought on board to > the new model. The spec was interpreted. The TAG negotiated in a way a > truce between the existing HTTP spec, RDF systems, and people who wanted > to use HTTP URIs without "#" to identify people. That truce was > HTTPRange-14, which said that yoiu don't a priory know that a hashless > HTTP URI denoted a document, but if the server responded with a 200 then > you did, and you had a representation of the document. If you did a get > on one of these new URIs which identified things were not documents > (people, RDF properties, classes, etc) them the server must not return > 200, it can return 303 pointing to a document which explains more. > > So the HTTP protocol was, effectively, changed. The HTTP protocol as > extended now allows HTTP to be used not only for Documents but for > arbitrary Things. It extends the set of things which you can ask a web > server about from documents to anything. It isn't a very bad design, nor > very beautiful. Other designs would have worked, but that one was the > only one which didn't have major problems for some community. It could be > extended, but basically it works. It would be very expensive to reverse it > in terms of systems which have been deployed. > > It is also very expensive to go on debating it as though it is an open > issue. It is reasonable to try to make the documents more consistent. > > Anyway, that is a simplified version of the history of all this as I saw > it. > > I would like to see what the documents all look like if edited to use the > words Document and Thing, and eliminate Resource. That's my best bet as to > two english words which mean as close as we can get to what we want. Note > however that the web is a new system, a design in which new concepts are > created, so we can't expect english words to exist to capture exactly the > concepts. So we take those nearby and abuse them as little as we can as > far as we can tell at the time, and then write them in initial caps to > recognize that that is what we have done. > > Tim > > > > > > > > > > > -- __________________________________________ Michael K. Bergman CEO Structured Dynamics LLC 319.621.5225 skype:michaelkbergman http://structureddynamics.com http://mkbergman.com http://www.linkedin.com/in/mkbergman __________________________________________
Received on Tuesday, 25 August 2009 01:02:11 UTC