- From: Sandro Hawke <sandro@w3.org>
- Date: Wed, 22 Jan 2003 00:17:24 -0500
- To: Tim Bray <tbray@textuality.com>
- cc: David Booth <dbooth@w3.org>, Michael Mealling <michael@neonym.net>, www-tag@w3.org
> I suggest you read RFC2396 and the Webarch draft. When I say a > formalism I mean formalism. A resource is per RFC2396 "anything that > has identity" and a URI is that which identifies a resource. As Mike Mealling puts it -- a "platonic ideal". There is exactly one resource per URI by definition. (Or, roughly, until you start getting 301 responses.) We can't know what that resource "is"; it's just an unknowable mental construct RFC2396 defines as existing. > A resource, thus defined, has access mechanisms whereby you can retrieve > and update representations. This formalism is complete, consistent, and > highly robust in practice, underlying the construction of the most > succesful information system in history. In fairness, I think this only applies to HTTP 1.1, not the entire web. Yes, the design of HTTP 1.1 uses an elegant and effective abstraction; from TimBLs earlier ideas of information resources (which he likes to call documents), this protocol makes a furthur step into abstraction saying, in effect "We don't care what an HTTP URI identifies; whatever the "resource" is, we just handle (MIME entity) representations of it." And that's a fine thing, within the context if HTTP itself. As far as I can think right now, you are right here in saying this model is consistent and has been very useful. > I admire your chutzpah in charging here and making claims about the > undefinedness of the term "Resource" but that doesn't mean you're > anything but hopelessly wrong. You could take David's message as a sign that a whole raft of professional software developers think this notion of Resource while workable is somewhere between poorly explained and imperfect. Working *perfectly* for HTTP is not evidence that it works anywhere else. (other people have cited the parable of the blind men and the elephant.) And the success of the Web is of course due to many, many factors. > You go on to observe correctly that once you step outside the formalism, > a resource can in fact be all sorts of different things, and that it > would help if we had a way to talk about what kind of thing it is. I > agree with all of that. However, the web architecture as it stands > works just fine without being able to talk about what any particular > resource "is" aside from "that which is identified by its particular URI". If web architecture == HTTP 1.1, then sure. Once you step outside the formalism, not only do you want to know what kind of thing a specific Resource is, but you notice that everone is using each URI to identify several distinct things. So the fundamental premise of 2396 breaks as soon as you step outside the formalism. > In the Web Architecture formalism, http://x.org/love identifies only one > resource. In the real world, I can learn about that resource by > retrieving representations of it (if any are available), and more by > processing RDF assertions about it (if any are available). The Web > architecture doesn't talk about meanings, it talks about resources and > representations. There's nothing wrong with talking about meaning, and > I look forward to the day when I can reliably retrieve some RDF > assertions and learn that this particular URI identifies nothing but a > JPG of a cute cat, and this other one identifies the inner thought of a > drug-addled conceptual artist. This would be good and useful. And if you http GET a representation of the artist, what will the Last-Modified field mean? It doesn't mean when the representation was last modified, or when the resource (the artist) was last modified. To quote RFC 2616: The exact meaning of this header field depends on the implementation of the origin server and the nature of the original resource. For files, it may be just the file system last-modified time. For entities with dynamically included parts, it may be the most recent of the set of last-modify times for its component parts. For database gateways, it may be the last-update time stamp of the record. For virtual objects, it may be the last time the internal state changed. This is where you have to start making a distinction between objects in the domain of discourse (like artists), and information which computer systems hold about those objects. In object-oriented design, you intentionally ignore this distinction, but when you start getting into manipulating the data directly, you need to notice it again. Can you draft text about Last-Modified that makes sense with the resource being an artist? So maybe the a-Resource-is-anything-with-Identity idea doesn't even really hold for HTTP 1.1. [ Sandro continues his argument that the word "Resource" masks an underlying disagreement and confusion even in the design of HTTP 1.1. It's not so bad that an expert human implementor can't sort it out and know when it refers to the object in the domain of discourse and when it refers to the computer's information about that object, ... but it is bad. ] > At the moment, speaking for myself, my impression is that the TAG has no > intention of saying anything beyond what's in 2396 and the Webarch draft. Then I wonder where this will get worked out. My best idea right now is to start a collection of ontologies of the web. The need for a single vision will be much reduced if/when the various different visions are clearly laid out. Any fans of 2396 and 2616 psyched to encode them into OWL? (Dan Connolly is the shoe-in, but I couldn't possibly motivate him to do this.) > The reason I'm willing to put so much energy into this is that I > agonized for a long time over the fact that in reality URIs identify > lots of different kinds of things and everybody was ignoring this > elephant in the room. Weirdly enough, this angst never got in the way > of my building spiders and search engines and visual maps of webspace > and all sorts of other useful things. It is quite possible that the Web > Architecture works *because* it works around the intractable problems of > meaning and only deals with comparing identifiers and shuffling > representations around; avoiding a lot of problems that historically > have been intractable. I wonder how different the web would be without HTTP. How much of the web functionality we use today could be implemented just fine with a subset of 1985's RFC 959 FTP protocol, accessed via ftp: URLs? In the days of Mosaic, I saw web sites done like this; it worked because Mosaic assumed Content-Type text/html when the filename ended in .html. So the content-type abstraction would have to be done differently, and POST would have to be done more explicitely using STOU (Store Unique), which might be a good thing. Various performance issues (number of TCP connections, cache support) would come out differently to be sure. But the nature of URIs would be so much more clear when they were "obviously" just filenames. (Of course there would be an equivalent of CGI, it would just be imagined slightly differently.) What does a filename (or file: URI) identify, and how is that really different from an HTTP URI? -- sandro
Received on Wednesday, 22 January 2003 00:19:27 UTC