My action item on Moby Dec, issue 14, etc from Tim Bray on 2002-09-19 (www-tag@w3.org from September 2002)

From: Tim Bray <tbray@textuality.com>
Date: Wed, 18 Sep 2002 21:38:40 -0700
To: WWW-Tag <www-tag@w3.org>
Message-ID: <3D8954D0.3020208@textuality.com>
When I say "Webarch" I mean http://www.w3.org/TR/2002/WD-webarch-20020830/

Following on a recent TAG telecon, I took an action item to chip away at 
the coalface exposed by excavating around our issue HttpRange-14, in 
which area  the TAG is more or less at an impasse.  Although I say 
nothing about the range of HTTP URIs, I nonetheless claim this 
discharges my action item.

I'm going to focus on the Webarch Principle that reads "Absolute URI 
references are unambiguous: Each absolute URI reference unambiguously 
identifies one resource."

Upon consideration, I think that this principle:

(a) is probably false, and
(b) there is no way to test, observationally, whether this principle is 
true or false, thus it falls outside the scope of science, and
(c) has little bearing on Web architecture from the point of view of 
people who implement real software, and
(d) Webarch has no need to say anything about what resources are, aside 
from a reference to the lightweight definition found in RFC2396 
("anything that has identity"), so
(e) This principle should simply be removed from Webarch forthwith.

In fact, unless a new line of argument comes up to make me change my 
mind, I'm probably going to become a hard-core advocate (as in do this 
or take my name off the document) of losing this counter-productive 
alleged "principle".

To take these up in order:

(a) I could easily synthesize a URI that when dereferenced, alternately 
returned representations of the current weather in Oaxaca and pictures 
of my adorable cat Bodoni.  I could, whenever the whimsy struck me, 
change one of the alternate representation generators to whatever struck 
my whimsy that day. This URI simply does not in any practical sense 
identify a single anything.

(b) Notwithstanding the above, I could construct an argument (which 
would be pretty strained and sophistry-packed) that the URI above does 
represent a resource whose definition involves T. Bray's whimsy and so 
on.  Man people would not be convinced.  It doesn't matter; since a URI 
is just a URI, there is no machine-testable way to ascertain whether the 
resource(s) whose representation(s) is(are) retrieved are one or many. 
Thus this so-called principle is non-testable, outside the domain of 
science, and can be (at best) an assertion of a particular world-view. 
I will freely grant that we normally assume URIs are bound unambiguously 
to resources that have some material or conceptual solidity, but this is 
such a truism that I'd feel silly spelling it out in this document.

(c) From the software builder's point of view, a URI is a character 
string whose syntax is governed by the appropriate RFCs upon which a 
small number of operations can be performed; the only one you can count 
on is comparison with other URIs.  Some URIs can also be dereferenced to 
yield a representation.  The workings of none of these operations are 
affected in the slightest by what a resource "is", granted only that the 
implementor bears in mind that what they're getting is a representation, 
with the limitations that implies.

URIs are also useful (as in RDF) in representing knowledge and building 
frameworks of assertions with the aim of performing inference.  Clearly, 
chains of inference break messily if a URI is used to refer to different 
things.  For example, if there is ambiguity about whether a particular 
URI refers to Moby Dick, the work, or Moby Dick, the particular printed 
volume on the shelves of some library, or the electronic catalogue 
record describing that particular paper artifact, all sorts of problems 
arise: is "size" to be measured in words of the novel, in pages of one 
printing, or in bytes consumed by a catalog record?

Unfortunately, in the real world, this kind of confusion will arise - no 
conceivable application of computer technology or social organization 
can prevent it - and knowledge representation systems need to be able to 
deal with the resulting dissonance.  The smooth operation of any system 
on which inference is to be based had better have a type system, and for 
any one object, agreement on the type of the object, before inference 
can proceed. In fact, detection of the case where ambiguity exists, and 
a robust strategy for operating when it is detected (even if it's only 
the KR equivalent of "404 not found") seems like a <i>sine qua non</i> 
for the progress of the Semantic Web.

An assertion in Webarch that such confusion cannot occur is as futile as 
the medieval church's insistence that Copernicus was wrong, or the 
classical hypertext theorists' assertion that links must not be allowed 
to break.

(d) The discussion about what a resource is is a well-known rathole 
which has consumed endless cycles and time among some of the world's 
smartest and most experienced practitioners.  Once we've noted what 
RFC2396 says on the subject, there are a bunch of other useful things we 
can say (all the other principles in section 2.1 of Webarch), none of 
which are in the slightest weakened or perceptibly affected by what a 
resource may or may not be. Fortunately, we need not go there.

(e) Let's get rid of this thing.  We will free up time to work on the 
important material in Section 3, we will remove content that has to do 
more with metaphysics than engineering, and we will not remove the 
tiniest atom of value from the document.  -Tim
Received on Thursday, 19 September 2002 00:38:44 UTC