- From: Tim Bray <tbray@textuality.com>
- Date: Wed, 18 Sep 2002 21:38:40 -0700
- To: WWW-Tag <www-tag@w3.org>
When I say "Webarch" I mean http://www.w3.org/TR/2002/WD-webarch-20020830/ Following on a recent TAG telecon, I took an action item to chip away at the coalface exposed by excavating around our issue HttpRange-14, in which area the TAG is more or less at an impasse. Although I say nothing about the range of HTTP URIs, I nonetheless claim this discharges my action item. I'm going to focus on the Webarch Principle that reads "Absolute URI references are unambiguous: Each absolute URI reference unambiguously identifies one resource." Upon consideration, I think that this principle: (a) is probably false, and (b) there is no way to test, observationally, whether this principle is true or false, thus it falls outside the scope of science, and (c) has little bearing on Web architecture from the point of view of people who implement real software, and (d) Webarch has no need to say anything about what resources are, aside from a reference to the lightweight definition found in RFC2396 ("anything that has identity"), so (e) This principle should simply be removed from Webarch forthwith. In fact, unless a new line of argument comes up to make me change my mind, I'm probably going to become a hard-core advocate (as in do this or take my name off the document) of losing this counter-productive alleged "principle". To take these up in order: (a) I could easily synthesize a URI that when dereferenced, alternately returned representations of the current weather in Oaxaca and pictures of my adorable cat Bodoni. I could, whenever the whimsy struck me, change one of the alternate representation generators to whatever struck my whimsy that day. This URI simply does not in any practical sense identify a single anything. (b) Notwithstanding the above, I could construct an argument (which would be pretty strained and sophistry-packed) that the URI above does represent a resource whose definition involves T. Bray's whimsy and so on. Man people would not be convinced. It doesn't matter; since a URI is just a URI, there is no machine-testable way to ascertain whether the resource(s) whose representation(s) is(are) retrieved are one or many. Thus this so-called principle is non-testable, outside the domain of science, and can be (at best) an assertion of a particular world-view. I will freely grant that we normally assume URIs are bound unambiguously to resources that have some material or conceptual solidity, but this is such a truism that I'd feel silly spelling it out in this document. (c) From the software builder's point of view, a URI is a character string whose syntax is governed by the appropriate RFCs upon which a small number of operations can be performed; the only one you can count on is comparison with other URIs. Some URIs can also be dereferenced to yield a representation. The workings of none of these operations are affected in the slightest by what a resource "is", granted only that the implementor bears in mind that what they're getting is a representation, with the limitations that implies. URIs are also useful (as in RDF) in representing knowledge and building frameworks of assertions with the aim of performing inference. Clearly, chains of inference break messily if a URI is used to refer to different things. For example, if there is ambiguity about whether a particular URI refers to Moby Dick, the work, or Moby Dick, the particular printed volume on the shelves of some library, or the electronic catalogue record describing that particular paper artifact, all sorts of problems arise: is "size" to be measured in words of the novel, in pages of one printing, or in bytes consumed by a catalog record? Unfortunately, in the real world, this kind of confusion will arise - no conceivable application of computer technology or social organization can prevent it - and knowledge representation systems need to be able to deal with the resulting dissonance. The smooth operation of any system on which inference is to be based had better have a type system, and for any one object, agreement on the type of the object, before inference can proceed. In fact, detection of the case where ambiguity exists, and a robust strategy for operating when it is detected (even if it's only the KR equivalent of "404 not found") seems like a <i>sine qua non</i> for the progress of the Semantic Web. An assertion in Webarch that such confusion cannot occur is as futile as the medieval church's insistence that Copernicus was wrong, or the classical hypertext theorists' assertion that links must not be allowed to break. (d) The discussion about what a resource is is a well-known rathole which has consumed endless cycles and time among some of the world's smartest and most experienced practitioners. Once we've noted what RFC2396 says on the subject, there are a bunch of other useful things we can say (all the other principles in section 2.1 of Webarch), none of which are in the slightest weakened or perceptibly affected by what a resource may or may not be. Fortunately, we need not go there. (e) Let's get rid of this thing. We will free up time to work on the important material in Section 3, we will remove content that has to do more with metaphysics than engineering, and we will not remove the tiniest atom of value from the document. -Tim
Received on Thursday, 19 September 2002 00:38:44 UTC