RE: 200 response as conclusive evidence of an information resource from Booth, David (HP Software - Boston) on 2008-12-02 (public-awwsw@w3.org from December 2008)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Tue, 2 Dec 2008 22:40:19 +0000
To: Jonathan Rees <jar@creativecommons.org>
CC: "public-awwsw@w3.org" <public-awwsw@w3.org>
Message-ID: <CD2B872281385A439B98164F5351E6DD39C4779F8F@GVW1144EXB.americas.hpqcorp.net>
> From: Jonathan Rees [mailto:jar@creativecommons.org]
>
> On Dec 1, 2008, at 2:47 AM, Booth, David (HP Software - Boston) wrote:
>
> > Are you saying that you think the httpRange-14 decision itself
> > contradicts the "URI owner" section of AWWW 2.2.2.1, or are you
> > saying that my interpretation contradicts it?
>
> The latter.
>
> >  Any in either case, why?
>
>
> The httpRange-14 rule says, in effect, two things:
>
> 1. RFC 2616 previously licensed the http: scheme for use with network
> resources (sections 1.3 and 3.2.2), without explicitly prohibiting
> other uses. We hereby license the http: scheme so that you can use it
> with arbitrary other kinds of things as well.

Okay.

>
> 2.  RFC 2616 licensed the HTTP protocol for use with request-URIs that
> denote "network data objects or services", without explicitly
> prohibiting other URIs. We hereby license the HTTP protocol with http:
> URIs denoting arbitrary things. However, for things that are not
> "information resources" per AWWW, please don't deliver 2xx responses,
> even though 2xxs are not explicitly prohibited by the RFC in this
> case. They might mislead someone into thinking the resource is an AWWW
> "information resource".

If you preferentially view the URI owner's other declarations about the resource as being intrinsically more reliable than the server's response code, then I can see how you would interpret the httpRange-14 finding this way.  But I don't see a justification for treating the response code as less reliable than the owner's other statements, and as I pointed out in
http://lists.w3.org/Archives/Public/public-awwsw/2008Dec/0000.html
I think there are good, practical reasons for treating the 2xx response as irrefutable evidence of an AWWW information resource.

I would characterize this part of the httpRange-14 finding as:
"Please don't deliver a 2xx response, because a 2xx response means that the resource *is* an AWWW information resource.  Hence, delivering a 2xx response while also claiming that the resource is something other than an AWWW information resource would cause a URI collision
http://www.w3.org/TR/webarch/#URI-collision
which should be avoided."

>
> (Oddly, it says nothing about the use of the HTTP protocol with non-
> network-resources named by URIs belonging to non-http: URI schemes...)
>
> One problem, as we've discussed, is the well-meaning off-label use of
> http: before the httpRange-14 rule came along, as with Dublin Core.
> Sure, if a URI owner agrees with the httpRange-14 rule, as you and I
> would like them to (modulo some uncertainty over what is and isn't an
> IR), then they should fix their server; but they have plausible
> deniability on their side, and could say nothing in RFC 2616 told them
> they couldn't use either the http: scheme or the HTTP protocol in the
> way they do. Bad practice by today's standards, sure, but not
> prohibited by spec. Those disagreeing with, or ignorant of, the rule
> could arguably still be in compliance with RFC 2616,

Correct.

> and nothing could
> justify imputing semantics to their 200 response.

Sure.  We would be justified in imputing the semantics of the httpRange-14 rule: that their 200 response implies an AWWW IR, even though they claim the URI denotes a concept.  Both are correct: their current server configuration causes a URI collision, so they should fix their server to comply with the AWWW's advice not to cause URI collisions.  I.e., what such a Dublin Core URI denotes is ambiguous: it denotes both an AWWW information resource *and* it denotes the concept that the Dublin Core spec says it denotes.

BTW, I discussed this in "Splitting Identities in Semantic Web Archtitecture":
http://dbooth.org/2007/splitting/#collision
[[
Multiple URI declarations and URI collision
The Architecture of the World Wide Web defines URI collision as "Using the same URI to directly identify different resources".  URI collision may occur if a URI has more than one URI declaration.  However, different declarations of a URI do not necessarily cause URI collision, because the constraints they express could be equivalent even though they are written differently.

How should multiple URI declarations for a URI be interpreted?  If one has a way to preferentially select one over another -- perhaps one is more recent (thus implicitly obsoleting others), or perhaps the evidence of the act of declaration is more compelling for one than another, or perhaps one can determine which URI declaration was intended when a statement author made a statement using the URI (see slide 2 at http://dbooth.org/2008/irsw/slides.ppt ) -- then it probably makes the most sense to use that URI declaration to interpret the meaning of the URI in an RDF statement.  Otherwise, one could think of the complete URI declaration for the URI as consisting of the disjunction of the individual URI declarations.
]]

>
> This is not a criticism of the rule - I think the rule is a good
> thing, for the reason stated (lowering the risk of misinterpretation),
> and would like people to follow it. But when I connect to a server and
> look at status codes, I can't count on the server (or those that
> control it) being aware of the rule. If you know ahead of time that a
> server has chosen to follow the rule, or does so by accident, then I
> agree that its 200s imply something about the nature of the resource.
> But lacking that I don't. Just because you can't count on everyone
> following it doesn't mean it's a bad rule.

You seem to be assuming that a URI can only denote one resource.  Clearly that is the *intent*: as the AWWW says,
http://www.w3.org/TR/webarch/#URI-collision
"By design, a URI identifies one resource".  But the whole idea of URI collision is that this is *not* always the case: sometimes a URI denotes *more* than one resource, i.e., sometimes the mapping from URI to resource is ambiguous, either because a single URI declaration was ambiguous or because there was more than one URI declaration for it.

>
> A second problem is the dissonance between RFC 2616 and AWWW. You have
> to do serious semantic gymnastics to make these align. I would forgive
> anyone for saying (perhaps in RDF) that their network service, for
> which a 200 is legitimately delivered according to RFC 2616, is NOT an
> information resource in the AWWW sense.

Well, if you are using the current (flawed) definition in AWWW, then I would also, because the current definition does not adequately cover the full variety of what can legitimately yield a 200.  That's why I've been advocating a definition of "information resource" as, essentially, a function from (Time x Requests) to Representations.

> It is too easy to imagine that
> there are network services that CANNOT have their essential
> characteristics conveyed in a message - or at least that a careful URI
> owner wouldn't want attributed to them an assertion that their
> resource *was* an AWWW information resource. If you say that their
> 200s will be interpreted this way, they will say they are following
> RFC 2616 to the letter and don't *want* their 200s to be interpreted
> that way. It doesn't matter that *you* think there's no contradiction.
>
> (In fact we agreed that RFC 2616 "resource" and AWWW "information
> resource" are disjoint... or did I misunderstand?)
>
> (I take RFC 2616 as a starting point, by the way, not only because
> that's what server software designers consult, but because it's where
> you land first via the "follow your nose" algorithm (after the URI
> scheme registry itself).)
>
> Maybe I am being dense. Maybe it should be obvious to me that every
> network data object or service can have its essential characteristics
> conveyed in a message.

No, the problem is that that definition of "information resource" is wrong.  Try using the definition of ftrr:IR that I proposed and it should make more sense:
http://lists.w3.org/Archives/Public/public-awwsw/2008Apr/0046.html

> Or maybe it's obvious that RFC 2616 should be
> ignored, or altered, where it is dissonant with AWWW (new covenant??)
> - maybe replace its definition of "resource" with a more modern one
> such as RDF's.

No, one just needs to realize that, for historical reasons, when RFC 2616 says "resource", it means what AWWW calls "information resource" (except that the current AWWW definition needs correction).

> Or maybe AWWW can be put aside or altered: redefine
> "information resource" to mean not what the glossary says but what RFC
> 2616 means by "resource" (or some superclass of it).

Yes!  The AWWW definition of IR needs to be corrected.  And while we're at it, it would be good to point out that it corresponds to what RFC2616 calls simply "resource".

> Or maybe we
> should throw away the specs and start de novo.  Any of these is
> possible, but we need to be honest and say what we're doing.

I see no need to change RFC2616, and I definitely wouldn't want to throw out AWWW: it's a tremendous work, though it does have some flaws that need correction.

>
> I'm not saying we don't have *influence* here and a chance to affect
> community behavior or thought - we could rewrite RFC 2616 or AWWW, or
> produce some kind of protocol extension or addendum or finding. I
> think it would be wrong to be Talmudic and try to figure out what the
> intent was, or should have been, or should be, behind these documents.
> We should start with what they say (especially RFC 2616, since it is
> going to reflect the deployed HTTP-web), and use this foundation as a
> way to identify inadequacies, survey the design space, and make
> recommendations.
>
> In any case -- I think we should focus not on "information resource"
> or 200, but on the more central problem of describing in RDF what the
> participants in an HTTP request/response pair are saying to one
> another (to paraphrase Stuart). I know the 200/IR issue is part of
> this, but I think it will be much easier to deal with once we have an
> ontology that can address the more fundamental parts of the problem.

I agree that in modeling HTTP interactions, there is not much need to talk about IRs.  However, this discussion about whether a URI that yields a 200 response can denote a non-IR does bring out some important issues of how semantic web architecture should work, and in particular, how a URI denotes a resource.



David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software

Statements made herein represent the views of the author and do not necessarily represent the official views of HP unless explicitly so stated.
Received on Tuesday, 2 December 2008 22:43:06 UTC