- From: Mike Bergman <mike@mkbergman.com>
- Date: Mon, 24 Aug 2009 20:01:22 -0500
- To: noah_mendelsohn@us.ibm.com
- CC: Tim Berners-Lee <timbl@w3.org>, "Roy T. Fielding" <fielding@gbiv.com>, Julian Reschke <julian.reschke@gmx.de>, Larry Masinter <masinter@adobe.com>, Mark Nottingham <mnot@mnot.net>, Pat Hayes <phayes@ihmc.us>, W3C TAG <www-tag@w3.org>
Hi All,
I don't know whether they would accept the commission (as I have
suggested before [1]), but I again suggest the TAG appoint Roy
Fielding and Pat Hayes to work jointly to present to the TAG a
resolution to these vexing terminology and semantics issues.
Further, if the TAG were to agree in advance to accept a
consensus recommendation from them, I think that goodwill and
intelligence will prevail. I, for one, would agree to the
recommendation.
Thanks, Mike
[1]
http://www.mkbergman.com/426/the-shaky-semantics-of-the-semantic-web/
noah_mendelsohn@us.ibm.com wrote:
> Tim Berners-Lee wrote:
>
>> I would like to see what the documents all look like if edited to
>> use the words Document and Thing, and eliminate Resource. That's my
>> best bet as to two english words which mean as close as we can get
>> to what we want.
>
> Yes on "thing"; as you've heard me say from time to time, I continue to
> have reservations about the word "document". No doubt "document" seems
> less intimidating than IR, and is often suggestive of what we mean. Still,
> I think it's actually too narrow, or at least troublingly ambiguous.
>
> Maybe I've hung out with the XML crowd to long, but one of the things that
> I tend to think of as characteristic of "documents", as opposed to "data",
> is that they tend to have ordered content. The order of the paragraphs in
> this email document is significant.
>
> Now, let's say that I have a resource (thing) that consists of an
> unordered set of stock quotes. Each quote is a {company name, price}
> pair, but there is no inherent or prefered order for the quotes. As a
> practical matter, any particular representation sent through HTTP will
> likely have the quotes in one order or another, but that order is an
> artifact of the representation technology, just like the angle brackets,
> whitespace or other delimiters for the quotes. I representation with the
> order changed would be equally appropriate.
>
> Question: is it OK to return a 200 for this bag of quotes? I hope so. Do
> we call an unordered bag of quotes a document? Well, we can, but I think
> it's a stretch.
>
> I played some role in suggesting the term "Information Resource" to the
> TAG in 2004. I acknowledge and regret that few seem to be pleased with
> it, but let me at least remind those who don't know how it came about. I
> wanted to find a term that more clearly covered cases like the one above
> (and relational tables, trees, graphs, and other data-like abstractions).
> It occurred to me that Claude Shannon, in his theory of Information,
> seemed to deal with exactly the sorts of abstractions for which we wanted
> to allow 200; I.e., those that could be represented by a sequence of
> bits, of agreed encoding. Can you apply Shannon's theory (which is
> really about error rates and reliablity) to attempts to transmit the text
> of the Gettysburg address? Yes, presuming sender and receiver can agree
> on an encoding. Can you apply Shannon's theory to my bag of stock quotes
> or to the information filling the (unordered!) rows and columns of a
> relational table? Yes. Can you apply it to attempts to somehow transmit
> me, the three dimensional living TAG member with the unruly hair? No. So,
> it's just the distinction we want.
>
> If everyone decides that on balance "document" is the lesser of the evils,
> I suppose I could go along with it, but I don't think it's quite right. If
> we use it, we should at least try to explain what's really covered and
> what's not. I still think that IR, in the sense intended, is closer to
> what we really mean. (If I have to return a 303 for a bag of stock
> quotes, I'm going to be annoyed.)
>
> Noah
>
> --------------------------------------
> Noah Mendelsohn
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------
>
>
>
>
>
>
>
>
> Tim Berners-Lee <timbl@w3.org>
> Sent by: www-tag-request@w3.org
> 08/01/2009 10:14 PM
>
> To: Pat Hayes <phayes@ihmc.us>
> cc: "Roy T. Fielding" <fielding@gbiv.com>, Larry Masinter
> <masinter@adobe.com>, Julian Reschke <julian.reschke@gmx.de>, Mark
> Nottingham <mnot@mnot.net>, W3C TAG <www-tag@w3.org>, (bcc: Noah
> Mendelsohn/Cambridge/IBM)
> Subject: Historical - Re: Proposed IETF/W3C task force:
> "Resource meaning" Review of new HTTPbis text for 303 See Other
>
>
>
> On 2009-07 -20, at 16:27, Pat Hayes wrote:
> [...]
>
> . But this thread started because HTTPbis explicitly disagrees with RFC
> 3986 on what a resource is. Surely these various documents should at least
> agree on their uses of the basic technical terminology.
>
> I agree.
>
> Historically, URIs were used to point to thinks like web pages and files
> and movies, on the web, useful documents, or "online resources" in the
> sense of useful things out there. FTP. Gopher and HTTP sites served up
> various types of online resources. People got used to http://example.com/
> being a web page and http://example.com/#contact being an anchor within
> it.
>
> The Online Information community, into whose domain the web stuff was put
> for standardization at the IETF, referred to these things like web pages
> as resources, and changed the original "D" for "Document" in "UDI" to
> "R".
> Some felt that resource was more appropriate term, maybe because
> "document" wasn't wide enough to include things like movies.
>
> Now the URI spec actually allowed URIs for completely different things,
> such as telephone end points, and wisely the URI spec does not make any
> arbitrary constraint on what a resource should be, especially a resource
> denoted by a URI in a new scheme to be invented.
>
> Meanwhile, the HTTP spec was polished and elaborated basically as a
> document delivery system, plus other methods for updating documents, plus
> POST. (POST started historically as a way of introducing a new web page y
> posting it to a list, just as in NNTP. It then almost immediately got
> used as a catch-all extension method. I will ignore it in this overview).
>
> There was no real definition of what a resource or document was -- maybe
> because it seemed obvious. The HTTP spec did not even specify whether the
> URI denoted a person or a document about them, it just explained that the
> thing returned representation of the resource.
>
> Roy's REST work then came along to formalize HTTP as REST and declared
> that a resource was a time-varying mapping between URI and representation.
> That was good enough for HTTP. It didn't have enough for the AWWW, when it
> came along, to be able to describe how the web worked.
>
> In fact, the AWWW document, to explain how to use the web properly, had to
> add in a bunch of stuff about the social expectations -- things like, yes,
> the mapping from URI to representation is a function of time, but not just
> any old one -- a random function is not typically very useful. There are
> expectations about it can change with time. Persistence, consistency,
> with various common patterns which allow the web to be a useful medium.
> The AWWW decided to use the term "Information Resource" for a thing like a
> web page which contains information, and "Resource" for any old thing at
> all.
>
> So HTTP and the REST work of was done very much in this space of document
> delivery, editing and update. There was no philosophical need to talk
> about what he URI denoted (the person, the web page about the person)
> until RDF came along, when there was an immediate need.
>
> When RDF was first developed, it was motivated by the need for data about
> resources very much in the online information sense: data about documents,
> or 'metadata'. In fact it was designed to be able to describe anything,
> but many early users of RDF referred to it as metadata technology. RDF
> used the word "resource" rather awkwardly in fact as it turned out. In
> the beginning, many of the things being described were documents, and so
> the online information meaning of resource made sense. But in fact in RDF
> the resource was allowed to be anything at all. A class, rdf:Resource even
> used the term as the universal class of all things. A little later, the
> Web Ontology Language decided to use Thing for that.
>
> RDF came along in what I think was a neat way. It used completely
> existing web protocol extension devices to introduce a new system which
> was fundamentally different from the old HTTP+HTML one. The HTML web was
> a hypertext model, which pages and anchors. The RDF model was a knowledge
> representation one of arbitrary things. It did this by using the fact
> that a new language can define whatever it likes as what a local
> identifier denotes. A graphic language might use local identifier to
> denote lines and points. HTML used local identifiers to identify hypertext
> anchors. RDF used them to identify arbitrary concepts, people, whatever.
>
> The web architecture gave all these languages a common way of building a
> global identifier for the thing denoted by a local identifier in a given
> document. The semantics of the hash sign are defined web-wide to mean
> that "a#b" can be used to denote whatever is denoted by "b" in the
> document denoted by "a".
>
> Worked a treat. At the beginning of the century, people played around and
> gave all kinds of things URIs like "http://example.com/foo.rdf#color".
> Some of us did lots of work and made all kinds of systems which exchanged
> and integrated data in this way.
>
> Two snags occurred, as the years passed. One was that a bunch of RDF
> users got the fact that it was good to use HTTP URIs, but didn't get the
> fact that you should put the foo.rdf online so that people can look up
> what #color means in it. And as they didn't do that, they didn't actually
> bother with the "#" at all. The second fly in the ointment was that some
> people wanting to use RDF for large systems found that they didn't want to
> use the "#". This was sometimes because the number of things defined in
> the same file was too low (like 1) or too large (like a million) and it
> was difficult to divide up the information into middle-sized chunks. Or
> they just didn't like the "#" because it looks weird. But for one reason
> or another people demanded the right to be able to use
> http://example.net/people/Pat to denote Pat rather than a web page about
> Pat.
>
> This potentially led to huge failures in the whole RDF world, with systems
> already built which just used "http://example.net/people/Pat" to
> identify the document whether you like it or not.
> I among others pushed back against using non-hash URIs for arbitrary
> things his but eventually gave in.
>
> So in response to this, the HTTP protocol was, in fact, changed.
>
> The spec wasn't changed. The spec editors were not brought on board to
> the new model. The spec was interpreted. The TAG negotiated in a way a
> truce between the existing HTTP spec, RDF systems, and people who wanted
> to use HTTP URIs without "#" to identify people. That truce was
> HTTPRange-14, which said that yoiu don't a priory know that a hashless
> HTTP URI denoted a document, but if the server responded with a 200 then
> you did, and you had a representation of the document. If you did a get
> on one of these new URIs which identified things were not documents
> (people, RDF properties, classes, etc) them the server must not return
> 200, it can return 303 pointing to a document which explains more.
>
> So the HTTP protocol was, effectively, changed. The HTTP protocol as
> extended now allows HTTP to be used not only for Documents but for
> arbitrary Things. It extends the set of things which you can ask a web
> server about from documents to anything. It isn't a very bad design, nor
> very beautiful. Other designs would have worked, but that one was the
> only one which didn't have major problems for some community. It could be
> extended, but basically it works. It would be very expensive to reverse it
> in terms of systems which have been deployed.
>
> It is also very expensive to go on debating it as though it is an open
> issue. It is reasonable to try to make the documents more consistent.
>
> Anyway, that is a simplified version of the history of all this as I saw
> it.
>
> I would like to see what the documents all look like if edited to use the
> words Document and Thing, and eliminate Resource. That's my best bet as to
> two english words which mean as close as we can get to what we want. Note
> however that the web is a new system, a design in which new concepts are
> created, so we can't expect english words to exist to capture exactly the
> concepts. So we take those nearby and abuse them as little as we can as
> far as we can tell at the time, and then write them in initial caps to
> recognize that that is what we have done.
>
> Tim
>
>
>
>
>
>
>
>
>
>
>
--
__________________________________________
Michael K. Bergman
CEO Structured Dynamics LLC
319.621.5225
skype:michaelkbergman
http://structureddynamics.com
http://mkbergman.com
http://www.linkedin.com/in/mkbergman
__________________________________________
Received on Tuesday, 25 August 2009 01:02:11 UTC