Re: Possible issue: Accept-language priority based on language order

On 11/25/2011 04:39 PM, Julian Reschke wrote:
> On 2011-11-25 00:15, Harald Alvestrand wrote:
>> Thanks for the datasets, Amos!
>>
>> Quick analysis of the 1742 different Accept-Language header:
>>
>> 156 multiple languages, none with q values
>> 247 single language with no q value
>> 43 all languages with q value
>> 1255 all languages but one with q value
>> 41 multiple languages without q value, some with q value
>>
>> I didn't check whether the values were always sorted; there were some
>> like this one:
>>
>> th-th,th;q=0.8,en-us;q=0.6,en-gb;q=0.4,en;q=0.2,x-ns1rW_REX3VNhu,x-ns2p1c0Nnym7b6 
>>
>>
>>
>> where it certainly looks as if the accept-language header was used to
>> communicate something that isn't a standard language, but strictly
>> speaking, those rightmost values sort before #2 from the left, because
>> the default q value is 1.0.
>>
>> So there are 197 examples of headers whose interpretation according to
>> the standard might be affected by the proposed interpretation (or
>> integration of information from another specification).
>
> Could you by any chance check what UAs are sending these?
>
> I just tried FF8/IE9/Chrome15/Opera11, and they all send q values. 
> (For Safari, I couldn't figure out how to get multiple languages into 
> the header).
They are all over the map, it seems.
Some Android versions, some Chrome versions, some MSIE Media Center 
edition, some Safari versions, some crawlers ...... it seems that these 
are the long tail of the browser market, and stuff that uses HTTP but 
isn't really browsers.

Example:

en-us,en Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; Valve Steam 
Client/1672; ) AppleWebKit/534.1 (KHTML, like Gecko) Chrome/6.0.444.0 
Safari/534.1

That's likely not a Chrome or a Safari, but an embedded browser in Valve 
Steam.
If practice out there is consistent with regard to this interpretation,
>> I think it is good to document it, so that we might reduce the chance of
>> future practice from diverging from current practice.
>
> I'm still unconvinced that there is a problem here:
>
> - we have no proof that the order is intentional and that it matters 
> in these cases
We have code examples that pick the leftmost language at the server.
We have no code examples that do anything else (so far).
>
> - there may be implementations written to what HTTP always said that 
> treat values with same q value (or missing q value) as having the same 
> preference, and a change to the spec would make those non-compliant
Hm. The Apache 1.3 documentation:
http://httpd.apache.org/docs/1.3/content-negotiation.html
seems to indicate that Apache treats them like a set.

The Apache 2.2 documentation:

http://httpd.apache.org/docs/2.2/content-negotiation.html
says "Select the variants with the best language match, using either the 
order of languages in the Accept-Language header (if present), or else 
the order of languages in the LanguagePriority directive (if present)."

(this step is applied only if two or more variants have the same preference)

So it seems that Apache 1.3 did not document the RFC 3282 behaviour, but 
Apache 2.2 documents that it behaves according to RFC 3282.

>
> - changing Accept-Language would make it inconsistent with the other 
> Accept header fields (or do we want to change them all?)
What other fields do we have?
The four Accept headers are Accept, Accept-Charset, Accept-Encoding and 
Accept-Language.
>
> We *could* add a note that the interpretation is different here to the 
> MIME variant, and point out that some senders may rely on it (if we 
> really believe that).
I don't believe this header is in significant use outside HTTP. The 
writeup in RFC 3282 was intended specifically to capture HTTP usage. If 
you really have evidence that common HTTP usage is not consistent with 
RFC 3282, an errata should be filed against RFC 3282.

I have so far seen no such evidence.

                      Harald

Received on Sunday, 27 November 2011 06:44:48 UTC