Re: Request that the WG reconsider section 3.4: Content Negotiation from Henry S. Thompson on 2013-11-05 (www-tag@w3.org from November 2013)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Tue, 05 Nov 2013 14:05:34 +0000
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: ietf-http-wg@w3.org, www-tag@w3.org
Message-ID: <f5bppqfdjv5.fsf@troutbeck.inf.ed.ac.uk>
Bjoern Hoehrmann writes:

> * Henry S. Thompson wrote:
>>What does rel='alternate' have to do with conneg, or HTTP?  Its
>>semantics are defined, as you say above, at an entirely different
>>level.
>
> "Reactive negotiation" is performed "after receiving an initial response
> from the origin server that contains a list of resources for alternative
> representations." and in HTML `rel='alternate'` is a way to encode such
> information that allows selecting an alternate representation "manually
> by the user".

See my response to Julian.  Either "contains a list" is a reference to
a 300 response, in which case I repeat my request to be pointed to
_any_ user agent which implements any kind of response to a 300 (other
than to treat it as a 200), or it refers to something in the message
body, in which case it has nothing to do with HTTP as a protocol.

Similarly, the semantics of rel='alternate' is entirely a matter for
the HTML specifications, it doesn't interact with HTTP-level conneg at
all.  I can't use rel="alternate" to, for example, force a particular
'Accept' header, and even if I could that _still_ wouldn't make this
reactive conneg, just good old simple proactive.

> The text in question describes using HTTP header fields to
> carry this information as an "if" and for status codes like 300 and 406
> the draft specifically says that the payload should include such a list.

Right, but until I see an example of _any_ user agent which actually
does _anything_ with a 300 header as such, I think describing 300 as
if it did anything is seriously misleading.

> There is nothing wrong with discussing in the specification that as an
> alternative to the "you say what you like and then the server chooses"
> it is also possible to implement "the server says what it has and then
> you choose" and including hyperlinks in the response body is a widely
> used way of doing so

I see nothing in the spec. that suggests that hyperlinks in the
response body are in scope in section 3.4.  From the perspective of
HTTP, the response body is opaque.  It's not part of any HTTP client's
HTTP-licensed behaviour to do anything with response bodies other than
hand them over to applications, is it?

> and it is implemented e.g. by search engines like
> Google, <https://support.google.com/webmasters/answer/189077>. When my
> server responds with, using the example there,
>
>   HTTP/1.1 200 OK
>   Link: <http://es.example.com/>; rel="alternate"; hreflang="es"

> and a search engine crawler solely interested in spanish content then
> automatically chooses to follow the link and to ignore the non-spanish
> site, how is that not "Reactive negotiation" as described in the draft?

Let's follow this hypothetical example through with a bit of care.  

 1) Crawler does 'GET' of http://www.example.com/;
 2) Response comes back with
   HTTP/1.1 200 OK
   Link: <http://es.example.com/>; rel="alternate"; hreflang="es"
   . . .
   Content-Language: [something other than es]

   . . .
   [some body]
 3) Crawler doesn't index this page because it's not Spanish
 4) Crawler does a 'GET' of http://es.example.com/ and, presumably,
    does index it.

Frist, note of course this is not what Google itself does -- it
indexes the original page, it just doesn't show it to someone who asks
for Spanish-only results.

Second, whether the behaviour of your hypothetical crawler falls under
the description of reactive conneg or not is beside the point.  If it
does, then the definition of reactive conneg _in the HTTP spec_ is too
broad, because it doesn't tell us anything we need to know about how
conforming clients, proxies and origin servers have to behave wrt HTTP
requests and responses.  It doesn't have any semantics at the HTTP
level:  The Link: header in 5988 is a convenience for users that is,
effectively, layered on top of HTTP.  After all, you could have chosen
to encorporate it into HTTPbis, but you, correctly, didn't.

The fact of the matter is that at the HTTP protocol level, all that is
absolutely necessary in section 3.4 and the later sections is an
introduction of how the semantics of the Accept... request headers,
the 300 and 406 status codes and the Vary response header fit
together.

Going a bit beyond that to point out that use of Accept... headers is
common practise in modern browsers and means of controlling
Content-Type and Content-Language are sometimes made available to
users by origin servers, but that little if any support for 300
responses is found in user agents would obviously be helpful to
implementors and users.

Going much beyond _that_, as the spec. does now, is either out of
scope or misleading or both.

ht
-- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
 [mail from me _always_ has a .sig like this -- mail without it is forged spam]
Received on Tuesday, 5 November 2013 14:06:14 UTC