- From: Tim Bray <tbray@textuality.com>
- Date: Wed, 18 Sep 2002 21:38:40 -0700
- To: WWW-Tag <www-tag@w3.org>
When I say "Webarch" I mean http://www.w3.org/TR/2002/WD-webarch-20020830/
Following on a recent TAG telecon, I took an action item to chip away at
the coalface exposed by excavating around our issue HttpRange-14, in
which area the TAG is more or less at an impasse. Although I say
nothing about the range of HTTP URIs, I nonetheless claim this
discharges my action item.
I'm going to focus on the Webarch Principle that reads "Absolute URI
references are unambiguous: Each absolute URI reference unambiguously
identifies one resource."
Upon consideration, I think that this principle:
(a) is probably false, and
(b) there is no way to test, observationally, whether this principle is
true or false, thus it falls outside the scope of science, and
(c) has little bearing on Web architecture from the point of view of
people who implement real software, and
(d) Webarch has no need to say anything about what resources are, aside
from a reference to the lightweight definition found in RFC2396
("anything that has identity"), so
(e) This principle should simply be removed from Webarch forthwith.
In fact, unless a new line of argument comes up to make me change my
mind, I'm probably going to become a hard-core advocate (as in do this
or take my name off the document) of losing this counter-productive
alleged "principle".
To take these up in order:
(a) I could easily synthesize a URI that when dereferenced, alternately
returned representations of the current weather in Oaxaca and pictures
of my adorable cat Bodoni. I could, whenever the whimsy struck me,
change one of the alternate representation generators to whatever struck
my whimsy that day. This URI simply does not in any practical sense
identify a single anything.
(b) Notwithstanding the above, I could construct an argument (which
would be pretty strained and sophistry-packed) that the URI above does
represent a resource whose definition involves T. Bray's whimsy and so
on. Man people would not be convinced. It doesn't matter; since a URI
is just a URI, there is no machine-testable way to ascertain whether the
resource(s) whose representation(s) is(are) retrieved are one or many.
Thus this so-called principle is non-testable, outside the domain of
science, and can be (at best) an assertion of a particular world-view.
I will freely grant that we normally assume URIs are bound unambiguously
to resources that have some material or conceptual solidity, but this is
such a truism that I'd feel silly spelling it out in this document.
(c) From the software builder's point of view, a URI is a character
string whose syntax is governed by the appropriate RFCs upon which a
small number of operations can be performed; the only one you can count
on is comparison with other URIs. Some URIs can also be dereferenced to
yield a representation. The workings of none of these operations are
affected in the slightest by what a resource "is", granted only that the
implementor bears in mind that what they're getting is a representation,
with the limitations that implies.
URIs are also useful (as in RDF) in representing knowledge and building
frameworks of assertions with the aim of performing inference. Clearly,
chains of inference break messily if a URI is used to refer to different
things. For example, if there is ambiguity about whether a particular
URI refers to Moby Dick, the work, or Moby Dick, the particular printed
volume on the shelves of some library, or the electronic catalogue
record describing that particular paper artifact, all sorts of problems
arise: is "size" to be measured in words of the novel, in pages of one
printing, or in bytes consumed by a catalog record?
Unfortunately, in the real world, this kind of confusion will arise - no
conceivable application of computer technology or social organization
can prevent it - and knowledge representation systems need to be able to
deal with the resulting dissonance. The smooth operation of any system
on which inference is to be based had better have a type system, and for
any one object, agreement on the type of the object, before inference
can proceed. In fact, detection of the case where ambiguity exists, and
a robust strategy for operating when it is detected (even if it's only
the KR equivalent of "404 not found") seems like a <i>sine qua non</i>
for the progress of the Semantic Web.
An assertion in Webarch that such confusion cannot occur is as futile as
the medieval church's insistence that Copernicus was wrong, or the
classical hypertext theorists' assertion that links must not be allowed
to break.
(d) The discussion about what a resource is is a well-known rathole
which has consumed endless cycles and time among some of the world's
smartest and most experienced practitioners. Once we've noted what
RFC2396 says on the subject, there are a bunch of other useful things we
can say (all the other principles in section 2.1 of Webarch), none of
which are in the slightest weakened or perceptibly affected by what a
resource may or may not be. Fortunately, we need not go there.
(e) Let's get rid of this thing. We will free up time to work on the
important material in Section 3, we will remove content that has to do
more with metaphysics than engineering, and we will not remove the
tiniest atom of value from the document. -Tim
Received on Thursday, 19 September 2002 00:38:44 UTC