Re: A question - use 301 instead of 406? from Richard Cyganiak on 2010-03-25 (public-lod@w3.org from March 2010)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 25 Mar 2010 12:59:44 +0000
To: Hugh Glaser <hg@ecs.soton.ac.uk>
Cc: Ted Thibodeau Jr <tthibodeau@openlinksw.com>, Linking Open Data <public-lod@w3.org>
Message-Id: <749623BE-08CA-47C9-BA3E-D2C278E8BF50@cyganiak.de>
On 24 Mar 2010, at 18:26, Hugh Glaser wrote:
> Richard, I'm not sure why you see it is a UI problem.
> (Remember - LD is about machine-processable data.)
> The only UI that might be involved is the web browser that was used  
> to view
> the original LD.

Yes, this is the UI I'm talking about.

> If you meant that, then yes, it is; the address bar should be  
> showing the
> NIR URI, not the html URI,

I disagree. Web browsers are for looking at web pages. They display  
HTML documents, not people or countries. The URL in the address bar  
should be the URI of the thing you're looking at -- which is always  
going to be a document.

You also seem to be forgetting that a document -- no matter if it's  
HTML or RDF -- often describes more than one thing. So there isn't a  
rigid 1:1:1 mapping between NIR URIs, HTML pages and RDF documents, as  
you seem to assume. It can be n:1:1 or n:0:1 or n:1:0. Look at RDF  
vocabularies as a typical case. Think about HTML pages that contain  
the results of a search, or an HTML page that lists page three of the  
products in some category. What would you want to show in the address  
bar there if not the page URL?

Imagine if there was a convention to use a certain icon or small logo  
in HTML pages to mean, “this is the link to an URI identifying the  
thing talked about here”. Similar to the RDF logo you see often today,  
or similar to the universal feed icon, but with the specific meaning,  
“this is an identifier for this object”. On a DBpedia page you'd  
probably see this icon once on the top of the page (it would link to  
dbpedia.org/resource/XYZ). On a search result page you could see one  
such icon next to each result. Then you could teach people that they  
should copy paste the link behind that icon (or drag&drop it) over  
into your web form.

<snip>
> Any solution that requires changes to the client software at the  
> application
> layer are a non-starter. There is a wadge of legacy software that  
> uses URIs
> that were html URIs that will now be the NIR URIs.

I think a better way to do this is to keep the HTML URI as an HTML  
URIs, and introduce a new URI for the topic of the page.

> Everything will work
> transparently (thank you 303), with the exception of when people  
> make the
> mistake above.
> So Richard's paragraph ending "problem solved", doesn't really. If  
> such
> behaviour was in the libraries it might, but it isn't.
> There really isn;t any point looking for a solution that requires  
> clients to
> interpret the html document to find the RDF URI etc, as it just  
> ain't gonna
> happen.

So you are saying, “we can change all the servers but we cannot change  
all the clients.” Why do you think so?

> 200 as it stands is not the behaviour I am looking for. I was aware  
> of the
> bit that you quote that it would be permissible to return the html  
> document
> with a 200, but what I wanted to do was the opposite. If I am  
> expecting rdf,
> it is not much more helpful to receive 200 with some html than a 406  
> - in
> fact in some ways it is less helpful.
> If it also said
>>> Note: HTTP/1.1 servers are allowed to return responses which are
>>> not the document referred to, but are acceptable according to the  
>>> accept
> headers sent in the request.
>>> In some cases, this may even be preferable to sending a 406  
>>> response.
> then we would be alright.
> I suspect that the people who wrote the paragraph were not thinking  
> Linked
> Data ( :-) ), and would have found it hard to think of a situation  
> where the
> format of the response was more important than the document referred  
> to.

Well the people who wrote the paragraph were probably not thinking  
linked data, but they have a very broad understanding of what HTTP can  
do, and linked data is just one application of that.

If you have an HTML document that exists in Chinese and Italian  
language versions, then isn't the “format” more important than what  
document it refers to? HTTP is designed with that kind of scenario in  
mind.

> I personally prefer the 406 solution in principle, if the  
> specification had
> been more fleshed out to specify the alternative representations,  
> and if the
> libraries recognised them to do the obvious thing.
> But I can't see that happening anytime soon, and so the only  
> solution that
> will work with this legacy software (and my and other people's  
> stupid cut
> and paste mistakes) is 301.

Or form validation. Or checking the HTML <head> for <link  
rel="alternate">.

> Of course, the existing sites should at least be returning 406 in  
> these
> circumstances, and my investigations suggest that many are not, so  
> at least
> that should be fixed pronto.

To fix this pronto, why don't you start by telling the Apache guys  
that they should change their server to return 406 instead of the 200  
it has been returning since 1996 (or whenever mod_negotiate was  
introduced).

The way I see it, linked data is a marriage of the existing web  
infrastructure with the RDF data model. The existing web  
infrastructure works. Don't mess with it unless you have to.

Hugh, you're a great man, but your stupid copy&paste errors are not  
reason enough to mess with the web infrastructure.

Best,
Richard


>
>
> Thank you all for all your helpful discussions.
> I anticipate the next round eagerly.
>
> Hugh
>
> On 24/03/2010 17:16, "Richard Cyganiak" <richard@cyganiak.de> wrote:
>
>> Hi Ted,
>>
>> On 24 Mar 2010, at 15:31, Ted Thibodeau Jr wrote:
>>> If I ask for application/rdf+xml representation of http://foo/
>>> bar.html,
>>> you *SHOULD NOT* 200 OK and give me text/html *unless* you are also
>>> and
>>> simultaneously providing a list of other alternatives (not formatted
>>> here as it would be in an HTTP response).
>>
>> I agree. Most importantly, wherever possible, representations should
>> always include links to other available representations.
>>
>>>     {"bar.html" 0.9 {type text/html} {language en}},
>>>     {"bar.rdf" 0.7 {type application/rdf+xml} {language en}},
>>>     {"bar.n3" 0.8 {type text/n3} {language en}}
>>>
>>> Please do not presume that the most important element of my  
>>> request is
>>> the URI in the request.  The most important element may well be the
>>> representation, and *I* (my agent) should be given the power to
>>> choose.
>>>
>>> In fact, let's look at the Transparent Content Negotiation RFC,  
>>> since
>>> that's really the focus of this question -- not HTTP per se.
>>
>> Wait a minute. The Transparent Content Negotiation RFC 2295 is an
>> experimental draft. It is not an IETF standard. Actually I've never
>> read it. I suppose it's great if servers implement it, but I would
>> disagree with your notion that servers that follow the HTTP spec and
>> do not implement RFC 2295 are doing something wrong.
>>
>>>> The reason for having the representation format specific URIs </
>>>> bar.html> and </bar.rdf> in the first place is to allow users to
>>>> override their user agent's Accept header.
>>>
>>> No.  Oh so very much, no.
>>
>> Well, it is the impression I get from reading this TAG finding:
>> http://www.w3.org/2001/tag/doc/alternatives-discovery.html
>>
>>>> For example, normal web browsers accept text/html but not
>>>> application/rdf+xml. There is no way how an average user can change
>>>> the browser's behaviour in this regard. Thus, if I direct my
>>>> browser to </bar> I would always get HTML. If, for whatever reason,
>>>> I want to see the RDF/XML, there's no way how I can do it. But if
>>>> the </bar.rdf> URI is configured to always returns RDF/XML, no
>>>> matter what the Accept header says, then the HTML can include a
>>>> link to </bar.rdf> and say, “go here if you really want RDF/XML.”
>>>> Problem solved.
>>>
>>> No.  Normal web browsers Accept: */*.
>>
>> Ok, you are right on this, the web browsers have */* in their headers
>> and I didn't quite think through my example.
>>
>> I think that the point stands though -- if the server responds with
>> 406 or 301 rather than the 200 on format specific URIs, then I cannot
>> ever access any representation except those that my client happens to
>> have in its Accept header. This is what essentially forces clients to
>> add the */*.
>>
>>>> Sending 406 (or 301) on the representation format specific URIs
>>>> like </bar.html> and </bar.rdf> negates the entire purpose of
>>>> having those URIs in the first place.
>>>
>>> Nonsense.  It is arguable that 406 (or 300, which I think is
>>> a more appropriate code than 301) present performance and
>>> scalability issues, but their effect (that is, to give the
>>> user/agent a List of Versions) may be preserved in the combined
>>> Normal + Choice response --
>>>
>>>     Server _____ proxy _____ proxy _____ user
>>>     x.org        cache       cache       agent
>>>
>>>       < ----------------------------------
>>>       |      GET http://x.org/paper
>>>       |       small Accept- headers
>>>       |
>>>     able to choose on
>>>     behalf of user agent
>>>       |
>>>        ---------------------------------- >    [choice response]
>>>             return of paper.1 and list
>>>
>>>
>>> If I ask for a specific Representation of Thing, I expect to get
>>> that -- or some information about why I'm not getting it, especially
>>> if you give me something I didn't ask for (another Representation).
>>>
>>> You don't have the Representation I've asked for?  OK -- tell me
>>> so.
>>
>> All the above may be true if you implement RFC 2295.
>>
>>> If you have a Representation which you're quite certain will
>>> be OK, *and I've told you you may*, you can also deliver that
>>> *with* the advice that you don't have the Representation I asked
>>> for -- but you MUST NOT deliver anything *without* such advice
>>> and permission.
>>
>> Again, this might be what the experimental draft RFC 2295 says, but
>> the HTTP 1.1 spec states explicitly that servers are allowed to
>> deliver responses that don't match the accept header. (It has to say
>> this because the entire content negotiation thing is optional in  
>> HTTP,
>> so servers can ignore all the Accept headers and still be compliant).
>>
>> Just read the sentence from the HTTP spec I pasted below. I would
>> think that this takes precedence here?
>>
>>>> A key bit of text from RFC 2616:
>>>>
>>>>>> Note: HTTP/1.1 servers are allowed to return responses which are
>>>>>> not acceptable according to the accept headers sent in the
>>>>>> request. In some cases, this may even be preferable to sending a
>>>>>> 406 response.
>>>>
>>>> Amen. 406 is actually counterproductive IMO. It just forces user
>>>> agents to include something like "*/*;q=0.01" in the Accept header
>>>> to work around those overeager content negotiation implementations
>>>> that are just looking for an excuse not to send a representation to
>>>> the client.
>>>>
>>>> <snip>
>>>>> That's OK if all that happens is I use the wrong URI straight  
>>>>> away.
>>>>> But what happens if I then enter it into a form that requires a LD
>>>>> URI, and
>>>>> then perhaps goes into a DB, and becomes a small part of a later
>>>>> process?
>>>>> Simply put, the process will fail maybe years later, and the
>>>>> possibility and
>>>>> knowledge to fix it will be long gone.
>>>>>
>>>>> Maybe the form validation is substandard, but I can see this as a
>>>>> situation
>>>>> that will recur a lot, because the root cause is that the address
>>>>> bar URI
>>>>> changes from the NIR URI. And most html pages do not have links to
>>>>> the NIR
>>>>> of the page you are on - I am even told that it is bad practice to
>>>>> make the
>>>>> main label of the page a link to itself - wikipedia certainly
>>>>> doesn't,
>>>>> although it is available as the "article" tab, which is not the
>>>>> normal thing
>>>>> of a page. SO in a world where wikipedia itself became LD, it
>>>>> would not be
>>>>> clear to someone who wanted the NIR URI where to find it.
>>>>
>>>> This is a serious problem. It is a UI problem and should be solved
>>>> on the UI level, not on the transfer protocol level. We have lots
>>>> of protocol people here and few UI people, so everyone tries to fix
>>>> everything in the protocols.
>>>
>>> I'm quite surprised at reading this from you, one of the dominant
>>> Voices of the Pedantic Web.
>>
>> If I polled the Pedantic Web group to see who has ever heard of RFC
>> 2295, I'd expect a lot of blank looks. Checking RFC 2295 compliance  
>> is
>> not really on the Pedantic Web agenda.
>>
>>> You negate much of those efforts with this post.
>>
>> Overstatement.
>>
>>> The fix is *already in* the protocols, if they're used as designed.
>>
>> The fix is not in the protocols. The fix is, perhaps, in an
>> experimental draft RFC that is not widely implemented.
>>
>> You digged out this obscure RFC and say, “if everyone just  
>> implemented
>> this then everything would be solved.” This is exactly the attitude I
>> meant when I said that everyone tries to fix everything in the
>> protocols around here.
>>
>> I was unaware that implementing RFC 2295 is somehow required for
>> linked data servers, and that servers who do not implement it are in
>> violation of something. At least the Four Principles don't mention  
>> RFC
>> 2295 ;-)
>>
>> Put another way, I think that servers can implement server-driven
>> content negotiation as specified in the HTTP 1.1 spec, or they can
>> implement transparent content negotiation, which seems to be defined
>> in much more detail in this experimental RFC 2295. I would call  
>> either
>> of those linked data compliant (and IMO you can be even a good linked
>> data citizen while implementing neither, e.g., by publishing only  
>> HTML
>> +RDFa, without any content negotiation). All the MUSTs and SHALLs
>> you've quoted above only apply if your server chooses to implement
>> transparent content negotiation, which is a good thing to do, but I
>> would hesitate to call it mandatory for linked data.
>>
>> Best,
>> Richard
>>
>>
>>
>>
>>>
>>> Be seeing you,
>>>
>>> Ted
>>>
>>>
>>>
>>>> A similar problem has plagued RSS in its early years. The solution
>>>> was the feed autodiscovery convention for the HTML header, and the
>>>> universal feed icon. Linked data needs something similar.
>>>>
>>>> Best,
>>>> Richard
>>>>
>>>>
>>>>>
>>>>> So that is some of the context and motivation.
>>>>> If we were to decide to be more forgiving, what might be done?
>>>>> How about using 301?
>>>>> <<Ducks>>
>>>>> To save you looking it up, I have appended the RFC2616 section to
>>>>> this
>>>>> email.
>>>>> That is
>>>>> Accept: application/rdf+xml http://foo/bar.html
>>>>> Should 301 to http://foo/bar
>>>>> It seems to me that it is basically doing what is required - it
>>>>> gives the
>>>>> client the expected access, while telling it (if it wants to hear)
>>>>> that it
>>>>> should correct the mistake.
>>>>> One worry (as Danius Michaelides pointed out to me) is that the
>>>>> caching may
>>>>> need careful consideration - should the response indicate that it
>>>>> is not
>>>>> cacheable, or is that not necessary?
>>>>>
>>>>> So that's about it.
>>>>> I am unhappy that users doing the obvious thing might get
>>>>> frustrated trying
>>>>> to find the URIs for heir Things, so really want a solution that
>>>>> is not just
>>>>> 406.
>>>>> Are there other ways of being nice to users, without putting a
>>>>> serious
>>>>> burden on the client software?
>>>>>
>>>>> I look forward to the usual helpful and thoughtful responses!
>>>>>
>>>>> By the way, I see no need to 301 to http:/foo/bar if you get a
>>>>> Accept: text/html http://foo/bar.rdf as the steps to that might
>>>>> lead to this
>>>>> would require someone looking at an rdf document to decide to use
>>>>> it as a
>>>>> NIR, which is much less likely. And the likelihood is that there
>>>>> is an
>>>>> eyeball there to see the problem.
>>>>> But maybe it should?
>>>>>
>>>>> Best
>>>>> Hugh
>>>>>
>>>>>
>>>>> 10.3.2 301 Moved Permanently
>>>>>
>>>>> The requested resource has been assigned a new permanent URI and  
>>>>> any
>>>>> future references to this resource SHOULD use one of the returned
>>>>> URIs.  Clients with link editing capabilities ought to  
>>>>> automatically
>>>>> re-link references to the Request-URI to one or more of the new
>>>>> references returned by the server, where possible. This response  
>>>>> is
>>>>> cacheable unless indicated otherwise.
>>>>>
>>>>> The new permanent URI SHOULD be given by the Location field in the
>>>>> response. Unless the request method was HEAD, the entity of the
>>>>> response SHOULD contain a short hypertext note with a hyperlink to
>>>>> the new URI(s).
>>>>>
>>>>> If the 301 status code is received in response to a request other
>>>>> than GET or HEAD, the user agent MUST NOT automatically redirect  
>>>>> the
>>>>> request unless it can be confirmed by the user, since this might
>>>>> change the conditions under which the request was issued.
>>>>>
>>>>>   Note: When automatically redirecting a POST request after
>>>>>   receiving a 301 status code, some existing HTTP/1.0 user agents
>>>>>   will erroneously change it into a GET request.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> A: Yes.                      http://www.guckes.net/faq/
>>> attribution.html
>>> | Q: Are you sure?
>>> | | A: Because it reverses the logical flow of conversation.
>>> | | | Q: Why is top posting frowned upon?
>>>
>>> Ted Thibodeau, Jr.           //               voice +1-781-273-0900
>>> x32
>>> Evangelism & Support         //
>>> mailto:tthibodeau@openlinksw.com
>>>                            //              http://twitter.com/ 
>>> TallTed
>>> OpenLink Software, Inc.      //              http://
>>> www.openlinksw.com/
>>>       10 Burlington Mall Road, Suite 265, Burlington MA 01803
>>>                                http://www.openlinksw.com/weblogs/ 
>>> uda/
>>> OpenLink Blogs              http://www.openlinksw.com/weblogs/
>>> virtuoso/
>>>                              http://www.openlinksw.com/blog/ 
>>> ~kidehen/
>>>   Universal Data Access and Virtual Database Technology Providers
>>>
>>>
>>>
>>>
>>
>
Received on Thursday, 25 March 2010 13:00:21 UTC