Re: Is 303 really necessary? from Nathan on 2010-11-05 (public-lod@w3.org from November 2010)

From: Nathan <nathan@webr3.org>
Date: Fri, 05 Nov 2010 10:57:03 +0000
To: Ian Davis <me@iandavis.com>
CC: Leigh Dodds <leigh.dodds@talis.com>, Harry Halpin <hhalpin@ibiblio.org>, public-lod@w3.org, Doug Schepers <schepers@w3.org>
Message-ID: <4CD3E2FF.2050902@webr3.org>
Ian Davis wrote:
> On Fri, Nov 5, 2010 at 10:05 AM, Nathan <nathan@webr3.org> wrote:
>> Not at all, I'm saying that if big-corp makes a /web crawler/ that describes
>> what documents are about and publishes RDF triples, then if you use 200 OK,
>> throughout the web you'll get (statements similar to) the following
>> asserted:
>>
>>  </toucan> :primaryTopic dbpedia:Toucan ; a :Document .
> 
> i don't think so. If the bigcorp is producing triples from their crawl
> then why wouldn't they use the triples they are sent (and/or
> content-location, link headers etc). The above looks like what you'd
> get from a third party translation of the crawl results without the
> context of actually having fetched the data from the URI.

Wouldn't be too sure about that, even the major browser vendors get it 
completely wrong, for instance do an XHR for a URI in chrome and even if 
there's 10 redirects in a chain, the base and the document uri is that 
which you requested. This is true all over the place, from using 
file_get_content's in PHP to most HTTP clients in any language, the 
pattern is simply:

   requested-uri = "http://...";
   doc = get(requested-uri);

info at the end is almost always ( requested-uri, doc ) - in fact often 
there's not even any way to get the redirected to URI back out from the 
HTTP client.

As for using the triples they are sent, all you need to do is consider 
an HTML crawler running over RDFa documents

> If the bigcorp is not linked data aware then today they will follow
> the 303 redirect as a standard HTTP redirect. rfc2616 says that the
> target URI is not a substitute for the original URI but just an
> alternate location to get a response from. The bigcorp will simply
> infer the statements you list above **even though there is a 303
> redirect**.

exactly, kind of semi-damning all /slash URIs.. or atleast requiring a 
load of provenance data.

> As rfc2616 itself points out, many user agents treat 302 and 303
> interchangeably. Only linked data aware agents will ascribe special
> meaning to 303 and they're the ones that are more likely to use the
> data they are sent.

God knows why linked data clients are ascribing any meaning to 303, the 
pattern's there to ensure that a thing and the doc describing it have 
different URIs, and to ensure that people don't say that thing is a 
document. Although it's not exactly worked out that way. The use of the 
particular status code 303 is only relevant if your ascribing meaning to 
the response code of GETs, if your not then 3** will do the same job.

Out of interest, just who is trawling the web and going "301 that's an 
IR, 303 that's maybe not an IR, 302 that's an IR".


My personal opinion on the entire thing is as simple as give different 
things different names, if there's a good chance something will think 
that thing is a different kind of thing by using a particular uri scheme 
or style (like saying mailto:foo@bar.org is a mailbox) then avoid it if 
it conflicts with the kind of thing you're describing. IMO slash URIs 
are often taken to mean documents, so I avoid them. You don't, so 
regardless of what status code you use, or how you deploy data, that 
conflation will be there. Thus my take away on the whole thing for you 
(and even though it goes against tag) is just 200 your uri's if you want 
to, but don't go around telling the rest of the world to do it and 
promote it as a good pattern, as it's not. tdb scheme or frag uris 
address the issues, whilst introducing others, but at least the data's 
somewhat cleaner.

I'll roll with the "who cares" line of thinking, I certainly don't care 
how you or dbpedia or foaf or dc publish your data, so long as I can 
deref it, but for god sake don't go telling everybody using slash URIs 
and 200 is "The Right Thing TM"

Best,

Nathan
Received on Friday, 5 November 2010 10:58:18 UTC