Thoughts on "what does an HTTP URI identify?" from Graham Klyne on 2002-07-30 (www-tag@w3.org from July 2002)

From: Graham Klyne <GK@NineByNine.org>
Date: Tue, 30 Jul 2002 12:01:00 +0100
To: www-tag@w3.org
Message-Id: <5.1.0.14.2.20020730105718.00a2a600@127.0.0.1>
These are some observations;  I'm currently open regarding the 
conclusion.  Roughly, my view is that there's no overwhelming technical 
reason to force one kind of usage over another, but there may be practical 
reasons to prefer a particular convention.


1. On specialization of URI schemes

There is a widely held view that, in general, a URI can identify 
anything.  We can't work out what a URI identifies by peeking at its scheme 
identifier.  I don't think this position is contradictory with the idea 
that certain *specific* URI schemes are more limited in their scope of 
identification.  For example, the tel: URI scheme 
[http://www.ietf.org/rfc/rfc2806.txt] is pretty clearly intended to be used 
for identifying telephone terminals.  One can argue that it's possible to 
use a tel: URI to identify, say, a Unicorn called Ulysses, but I can't see 
that is really helpful.

A URI scheme defines, among other things, a naming authority structure - 
rules that determine who gets to allocate names and any constraints upon 
such allocations.  It seems quite reasonable to me to say that a given 
scheme X has name allocation rules that have the effect of constraining the 
kinds of things that can be named using X.  For example, the tel: scheme 
identifiers are clearly bound to numbers serviced by a telephone 
network;  the 'global' form of telephone number defers to the E.164 
international standard telephone numbering plan for its naming authority.

So, I submit, the general principle of identification not being constrained 
by URI scheme doesn't exclude that certain specific URI schemes may 
restrict what is named.  And conversely, the existence of URI schemes with 
identification constraints doesn't weaken the principle that a scheme may, 
in general, be used to identify anything.  Can there be any reasonable 
constraints on what uuid: or urn: may identify?


2. What can we say about http: URIs?

The naming authority is based on network retrieval.

In particular "The semantics are that the identified resource is located at 
the server listening for TCP connections on that port of that host..." 
[RFC2616].

[[
3.2.2 http URL

    The "http" scheme is used to locate network resources via the HTTP
    protocol. This section defines the scheme-specific syntax and
    semantics for http URLs.

    http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]

    If the port is empty or not given, port 80 is assumed. The semantics
    are that the identified resource is located at the server listening
    for TCP connections on that port of that host, and the Request-URI
    for the resource is abs_path (section 5.1.2). ...
]]

Similar words are in RFC2068, which is cited in the IANA URI scheme 
registry as the defining document for the http: scheme.  These words seem 
to support TimBL's view (which under the circumstances isn't an 
overwhelming argument of itself, but suggestive).


3. What does a representation represent?

Much of this debate seems to depend on whether one can regard (say) a JPEG 
picture of a car as being a representation of the car, or a representation 
of a document describing a car.

It seems to me that either view is sustainable, and maybe can even coexist 
(which is probably just as well because I expect folks will be doing both 
for a while).  For example, on my own web site I currently use URIs of the 
form http://id.ninebynine.org/ to identify abstract concepts related to my 
own experimental developments.  I place documents at those URIs that are 
intended to explain (more or less) what I intend the identifiers to denote 
-- and at this time, I mean the URIs to denote abstract concepts, not the 
documents.  So how can I talk about the documents that describe the 
identifiers?  In my case, that's easy:  as it happens, the URIs 
http://www.ninebynine.org/ident/... retrieve exactly the same set of 
documents.  So a possibility here is that the first form of URI directly 
reference the abstractions described by the web pages, and the second form 
can directly identify the documents themselves.  I'm not claiming this is a 
Good Idea, just a possibility.  And I suspect it's a possibility we have to 
live with.


4. Questions

The questions I then ask myself are:

Would it be helpful for the community at large to have a preferred 
convention for the interpretation of what http and similar URIs identify?
- I think that the answer is probably "yes" -- if only so we don't end up 
repeating this debate over the next decade or so.

What approach is most helpful?
- I'm pretty agnostic, but I am leaning toward the idea that an HTTP URI 
directly identifies a document, rather than what the document describes.  I 
think there are other ways to capture the indirect reference (e.g. fragment 
IDs;  one proposal is at 
http://www.ninebynine.org/wip/RDF-basics/2002-07-29/Overview.htm#xtocid103660).

Does this need to be set in stone for the web to survive and grow?
- I hope not;  I don't think so.  I think there's already a diversity of 
usage and we somehow need to accommodate that.


#g


-------------------
Graham Klyne
<GK@NineByNine.org>
Received on Tuesday, 30 July 2002 07:30:22 UTC