W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > October 2007

Re: RFC 2616 vs. AWWW

From: Jonathan Rees <jonathan.rees@gmail.com>
Date: Sat, 13 Oct 2007 08:40:35 -0400
Message-ID: <3cff5e070710130540o3dfdd848l4be43c214ac69c2@mail.gmail.com>
To: "Pat Hayes" <phayes@ihmc.us>
Cc: public-semweb-lifesci <public-semweb-lifesci@w3.org>

On 10/12/07, Pat Hayes <phayes@ihmc.us> wrote:
> * If a network resource responds to a GET request with a 2xx
> response, then that URI must be understood as referring to that
> network resource.
>
> >Oddly, this rule doesn't tell you which network resource is
> >referenced; it could be one that's unrelated to the network resource
> >that responds to the GET.
>
> Then that is a mistake, it should refer to the same resource.
>
> >(Aside: As I have said before, the text also needs to be changed to
> >say that the 200 only ''states'' that the referenced resource is a
> >network resource, not that it ''is'' one, since the server could be
> >wrong.)
>
> ? I don't follow you. If any response other than a 404 comes back,
> then certainly there must have been a network resource there to emit
> the response. No room for error there, right? And what the
> httpRange-14 decision says (should say) is, then that URI should be
> understood as denoting that actual network resource. That matters to
> the sender of the GET, usually, more than than to the server. If Im
> messing around with URIs, I don't really care what the server thinks
> they denote. Its job is just to respond to my requests, not to have
> opinions about them.

Maybe I'm confused, but I'd like to get this one straight. AWWW says:

"URI ownership gives the relevant social entity certain rights, including
... to associate a resource with an owned URI."

I would leap from here to saying that the URI owner has the right to
declare properties that any thing must have if that thing is going to
be the referent of the URI.

So a server responds with a 2xx. Does this constitute the URI owner
specifying that the referent is an http endpoint? What if:
  - the server has a bug, and is not correctly speaking on behalf of
the URI owner (misplaced trust of the URI owner in a piece of
hardware/software)?
  - the server has published separately and more prominently that the
referent is not an http endpoint, e.g. is a potato  (incompetent
server configuration)?
  - the domain name changes hands and the new owner specifies that the
URI is not an http endpoint? (Alan R tells me not to worry about this.
Putting the year in the URI helps avoid confusion.)

I was trying to address the first two cases by suggesting that a URI
refers to an http endpoint, and the URI owner and others can agree on
this, and any statement to the contrary by the http server has less
importance than what the owner and the community say.

If we can answer this question, then we can go on to the one that's
near the epicenter of the http/other scheme debate, which is this: is
the http endpoint to which the URI refers (we know it's an http
endpoint because of the 2xx) the actual one that the http server
identifies, or is it the ideal one that the http server is *supposed*
to identify?

The first answer (URI relates to actual behavior) is principled and
empirical; I can understand it. To get information about that
resource, I perform experiments - any response is constraining on the
actual endpoint, i.e. the endpoint has to be one whose nature allows
it to have responded in that way. A very weak constraint, but better
than nothing as it can be used to confirm or refute statements about
the endpoint.

The second answer (URI relates to behavior intended by owner) is
nicer, since it allows the URI to refer to things whose behavior can
be predicted with certainty, such as a fixed PDF file. This confers
all the advantages that "identifiers" (references?) have over
locators, including rigorous notions of mirroring and citation. It is
also insurance against orphaning - you can be more confident that the
URIs will refer to things after the http server goes south (and they
may even be accessible on some other server, via proxying or URI
rewriting).

The problem with #2 is that unlike in, say, the LSID protocol, there
is no standard communication channel open that would allow the owner
to tell others what thing is intended by the URI (what the owner wants
the URI to refer to). You have to scavenge and reverse engineer. (For
example, tell me how an automated agent would or should distinguish
the endpoints denoted by the URIs
http://www.w3.org/TR/2004/REC-webarch-20041215/
and http://www.w3.org/TR/webarch/ . You and I know what they refer to,
but this isn't formalized as far as I know - at best it is documented
in prose in some unknown location.)

The problem with #1 is that you would have to use 303s or # URIs to
obtain high-quality references, and no one (other than I) would
bother.

I'm neutral on which of these two positions is taken, but I want one
or the other to be taken with conviction.

The issue for me isn't whether 'http endpoint' is a type that subsumes
documents; it's whether what's being said is clear and robust.

This isn't academic. The Library of Congress trashes the http: scheme
[1] in the same way that the LSID spec does - they say it's no good
because URIs are locators (first answer) instead of "identifiers"
(references; second answer). The justification for using http: for
literature reference, even in the best of circumstances, has got to be
better than "trust me" or "it usually works" or "you're being anal".

[1] http://www.loc.gov/standards/uri/news.html
Received on Saturday, 13 October 2007 12:40:53 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:00:50 GMT