Re: A question - use 301 instead of 406? from Nathan on 2010-03-24 (public-lod@w3.org from March 2010)

From: Nathan <nathan@webr3.org>
Date: Wed, 24 Mar 2010 15:04:17 +0000
To: Richard Cyganiak <richard@cyganiak.de>
CC: Hugh Glaser <hg@ecs.soton.ac.uk>, Linking Open Data <public-lod@w3.org>
Message-ID: <4BAA29F1.3020901@webr3.org>
Hi All,

After much thought recently I've taken the following approach (please do
negate the fact I'm using .html etc in examples, it's only for clarity
in this email).

Suppose I have a real world object:
http://example.com/resource/London

and then an html and rdf description
http://example.com/page/London.html
http://example.com/data/London.rtf

then I am adding in one more resource to the equation; a resource which
identifies the description; which then acts as the point for content
negotiation. http://example.com/descriptions/London

Thus:

REQUEST->>>
GET /resource/London HTTP/1.1
Host: example.com
Accept: text/html;q=0.5, application/rdf+xml

<<<-RESPONSE
HTTP/1.1 303 See Other
Location: http://example.com/descriptions/London

REQUEST->>>
GET /descriptions/London HTTP/1.1
Host: example.com
Accept: text/html;q=0.5, application/rdf+xml

<<<-RESPONSE
HTTP/1.1 200 OK
Content-Location: http://example.com/page/London.html
Content-Type: text/html
(and Vary: etc)

This way /descriptions/London stays in the address bar and the:
GET /descriptions/London HTTP/1.1
Accept: application/rdf+xml
request that may be used in the future stays good because we've
separated the cross cutting concerns.

note: it could also be a 300 Multiple Choices + Location header for that
final response.

This also helps with the wrong uri in the db scenario, because even in a
worst case scenario where /descriptions/London is used rather than
/resource/London then any RDF processor can simply read that
/descriptions/London is a resource which describes /resource/London
rather than being /resource/London itself; a simple bit of reasoning
over isPrimaryTopicOf or similar will fix this.

Comments / Corrections?

Regards!

Richard Cyganiak wrote:
> Hugh,
> 
> On 23 Mar 2010, at 22:50, Hugh Glaser wrote:
>> Assuming that we are in the usual situation of http://foo/bar doing a
>> 303 to
>> http://foo/bar.rdf when it gets a Accept: application/rdf+xml
>> http://foo/bar
>> what should a server do when it gets a request for
>> Accept: application/rdf+xml http://foo/bar.html ?
>>
>> OK, the answer is 406.
> 
> No. The answer is 200, with the HTML representation. Content negotiation
> should happen on the “generic” URI, e.g., <http://foo/bar>, but not on
> the representation format specific URIs.
> 
> The reason for having the representation format specific URIs
> </bar.html> and </bar.rdf> in the first place is to allow users to
> override their user agent's Accept header.
> 
> For example, normal web browsers accept text/html but not
> application/rdf+xml. There is no way how an average user can change the
> browser's behaviour in this regard. Thus, if I direct my browser to
> </bar> I would always get HTML. If, for whatever reason, I want to see
> the RDF/XML, there's no way how I can do it. But if the </bar.rdf> URI
> is configured to always returns RDF/XML, no matter what the Accept
> header says, then the HTML can include a link to </bar.rdf> and say, “go
> here if you really want RDF/XML.” Problem solved.
> 
> Sending 406 (or 301) on the representation format specific URIs like
> </bar.html> and </bar.rdf> negates the entire purpose of having those
> URIs in the first place.
> 
> A key bit of text from RFC 2616:
> 
>>> Note: HTTP/1.1 servers are allowed to return responses which are not
>>> acceptable according to the accept headers sent in the request. In
>>> some cases, this may even be preferable to sending a 406 response.
> 
> Amen. 406 is actually counterproductive IMO. It just forces user agents
> to include something like "*/*;q=0.01" in the Accept header to work
> around those overeager content negotiation implementations that are just
> looking for an excuse not to send a representation to the client.
> 
> <snip>
>> That's OK if all that happens is I use the wrong URI straight away.
>> But what happens if I then enter it into a form that requires a LD
>> URI, and
>> then perhaps goes into a DB, and becomes a small part of a later process?
>> Simply put, the process will fail maybe years later, and the
>> possibility and
>> knowledge to fix it will be long gone.
>>
>> Maybe the form validation is substandard, but I can see this as a
>> situation
>> that will recur a lot, because the root cause is that the address bar URI
>> changes from the NIR URI. And most html pages do not have links to the
>> NIR
>> of the page you are on - I am even told that it is bad practice to
>> make the
>> main label of the page a link to itself - wikipedia certainly doesn't,
>> although it is available as the "article" tab, which is not the normal
>> thing
>> of a page. SO in a world where wikipedia itself became LD, it would
>> not be
>> clear to someone who wanted the NIR URI where to find it.
> 
> This is a serious problem. It is a UI problem and should be solved on
> the UI level, not on the transfer protocol level. We have lots of
> protocol people here and few UI people, so everyone tries to fix
> everything in the protocols.
> 
> A similar problem has plagued RSS in its early years. The solution was
> the feed autodiscovery convention for the HTML header, and the universal
> feed icon. Linked data needs something similar.
> 
> Best,
> Richard
> 
> 
>>
>> So that is some of the context and motivation.
>> If we were to decide to be more forgiving, what might be done?
>> How about using 301?
>> <<Ducks>>
>> To save you looking it up, I have appended the RFC2616 section to this
>> email.
>> That is
>> Accept: application/rdf+xml http://foo/bar.html
>> Should 301 to http://foo/bar
>> It seems to me that it is basically doing what is required - it gives the
>> client the expected access, while telling it (if it wants to hear)
>> that it
>> should correct the mistake.
>> One worry (as Danius Michaelides pointed out to me) is that the
>> caching may
>> need careful consideration - should the response indicate that it is not
>> cacheable, or is that not necessary?
>>
>> So that's about it.
>> I am unhappy that users doing the obvious thing might get frustrated
>> trying
>> to find the URIs for heir Things, so really want a solution that is
>> not just
>> 406.
>> Are there other ways of being nice to users, without putting a serious
>> burden on the client software?
>>
>> I look forward to the usual helpful and thoughtful responses!
>>
>> By the way, I see no need to 301 to http:/foo/bar if you get a
>> Accept: text/html http://foo/bar.rdf as the steps to that might lead
>> to this
>> would require someone looking at an rdf document to decide to use it as a
>> NIR, which is much less likely. And the likelihood is that there is an
>> eyeball there to see the problem.
>> But maybe it should?
>>
>> Best
>> Hugh
>>
>>
>> 10.3.2 301 Moved Permanently
>>
>>   The requested resource has been assigned a new permanent URI and any
>>   future references to this resource SHOULD use one of the returned
>>   URIs.  Clients with link editing capabilities ought to automatically
>>   re-link references to the Request-URI to one or more of the new
>>   references returned by the server, where possible. This response is
>>   cacheable unless indicated otherwise.
>>
>>   The new permanent URI SHOULD be given by the Location field in the
>>   response. Unless the request method was HEAD, the entity of the
>>   response SHOULD contain a short hypertext note with a hyperlink to
>>   the new URI(s).
>>
>>   If the 301 status code is received in response to a request other
>>   than GET or HEAD, the user agent MUST NOT automatically redirect the
>>   request unless it can be confirmed by the user, since this might
>>   change the conditions under which the request was issued.
>>
>>      Note: When automatically redirecting a POST request after
>>      receiving a 301 status code, some existing HTTP/1.0 user agents
>>      will erroneously change it into a GET request.
>>
>>
> 
> 
> 
>
Received on Wednesday, 24 March 2010 15:05:02 UTC