Does a http URI identify a "web page"?

[sorry if you see two of these, my computer woke up this morning 
thinking it was 1969, which causes breakage of odd kinds]

Tim Berners-Lee wrote:

 > A model in which URIs identify web pages is a REST model,
 > and a rather better one than one where they don't.  I would
 > really like you to work through the model and see that it doesn't
 > break anywhere. You may have to introduce another term.
 > But you;ll end up with something much more useful IMHO.

Anybody who isn't, by this point, bleeding from the brain over this 
thread is a much stronger person than I.  I have made a resolution that 
I will go back and read all the messages once or twice more and think 
about it some more, but in the interim here are some more data points.

I've always been really sympathetic to what you might call the Fielding 
position ("web architecture doesn't know/care what a resource *is*, it 
just compares URIs and interchanges representations").  The corollary is 
that since people obviously care what a resource is, we need to 
establish some policy to keep things manageable ("Cool URLs don't 
change") and some mechanisms to talk about what resources are (RDF & the 
rest of the semweb stuff).

While TimBL's world-view seems consistent, I just have real trouble with 
the notion that http: URIs necessarily identify web pages, because it 
seems to me that there are lots of them that just don't.  Let me give a 
couple of examples.

1. Antarctica's Visual Net

This is the application that my company sells, of which I wrote a large 
part.  It is implemented as an Apache module, and presents maps of 
information spaces.  For a large information space with millions of 
objects, clearly an effectively infinite number of useful maps can be drawn.

Each of those maps is URI-addressable (with a certain amount of 
"?arg=value&arg=value" in the URI, but that's fair), and each 
dereference request provokes a really complex flurry of computation 
against a bunch of volatile in-memory data structures, some really 
aggressive user-agent sniffing, and the emission of  pure HTML with 
bitmaps, pure HTML with a bunch of vector graphics code, or pure XML 
with no graphics code at all, in two different possible XML dialects, 
and in the future likely something completely different.  When we 
generate XML, the representation is of almost no direct use and needs 
further processing on the client side (in XSL or the Flash MX engine or 
a 3d renderer) to be useful.

We violate REST in that we use cookies, but we try really hard to pack 
as much of the map identification into the URI as possible.  We *hope* 
that the Web's caching machinery will keep clients from stupidly 
re-dereferencing a map in the interests of keeping our server loads 
manageable.  In some deployments, when you drill way down into the maps 
at a high level of detail, the next drill-down URI into the map space 
might well decide to branch into the underlying data store (ERP systemk, 
library catalog, whatever) use its output as the representation.  We 
reserve the right in future to invent new kinds of representations that 
we can't begin to imagine now.

Anyhow, no matter how far I turn my head sideways and squint, it just 
doesn't feel correct to say that the URIs pointing into one of our map 
deployments represent, in any meaningful sense, a "web page".  That is 
to say, the representation returned by any one dereference is not 
fundamental; it is ephemeral and neither the users nor the programmers 
would for a second consider it to "be" the resource.  It feels perfectly 
comfortable to say that each of these URIs identifies a resource and 
that our software emits representations.  It feels perfectly natural to 
make RDF assertions about particular URIs in the space without worrying 
about what representation you might see next.  I'm sorry, I don't think 
these URIs identify web pages; they identify resources.

2. XML Namespace Names

Namespace names are URIs, and they were chosen this way back in 1999 
largely (in the XML community) because of their useful syntactic 
uniqueness properties and (in the nascent RDF community) because of the 
emerging grander ambitions for URIs.

For some years, I steadfastly argued that these URIs were just names and 
don't you worry your pretty little head about what they point at.  This 
position turned out to be untenable; the user population really wanted 
to dereference these and get something back.

So now we're arguing about what representations to return and the 
various flavors of RDDL.  Well, if you consider that an XML Namespace is 
a Resource, there's no inconsistency or angst here.  The resource 
previously was typically without representations and still worked OK; 
and now it turns out that a RDDL document will likely be a very useful 
representation of that resource.  Dan argues hotly that an XML Schema is 
a useful representation of a namespace-name resource and despite the 
fact that <snicker> he's clearly wrong about it being useful, it is 
undeniably some kind of a representation.

Once again, no matter how hard I try, it's easy to believe that XML 
Namespaces are resources, but really hard to believe that they're web pages.

Concluding notes:
(a) In both of my examples, the resources identified by the URI map 
fairly nicely onto the actual meaning of the English word "resource" - 
one of Antarctica's maps is a resource in human-speak (that's why people 
pay for the software), and if an XML Namespace (typically a pre-coooked 
XML vocabulary with pre-cooked semantics) isn't a resource as the word 
is normally used, I don't know what is.  My point is not only is the 
Fielding formalism useful to programmers and self-consistent, the 
terminology is useful to ordinary people.

(b) In my vision of the semantic web, it makes all sorts of sense to 
package up RDF assertions about Antarctica's maps or XML namespaces and 
these could be really useful without pretending, against the evidence, 
that either kind of URI actually points at a "web page".


Received on Saturday, 25 January 2003 13:15:32 UTC