Re: #428 Accept-Language ordering for identical qvalues

On 21/01/2013 12:30 p.m., Adrien W. de Croy wrote:
>   ------ Original Message ------
> From: "James M Snell" <jasnell@gmail.com <mailto:jasnell@gmail.com>>
>>
>> +1.. in fact, for 2.0, I'd very much like to get rid of q-values 
>> entirely and depend entirely on order.
>>
> same here.
> The idea may have been laudable in 1998, but really, how can a web 
> server tell if some resource is 80% better than another? A human needs 
> to tell it, and humans have enough trouble with other things.
> the q=0 option would need to be turned into a Naccept-* header or 
> something.   But does anyone even use it outside of testing for 406 
> responses which never come?

My collection of 2 years worth of language headers says no.

Of 2018 unique Accept-Language header field-values;
   1532 are using q-values in a strictly sorted list
   491 are not using q-values
   14 are using "q=0.0".
   5 are using q-values and non-qvalues without ordering the sent list 
(1 looks otherwise normal, teh others are using puny-codes)

The 14 are also unique in being very long and having multiple entries 
with equal q-values. They are still without exception strictly ordered 
with the entries having no q-value entries first (as if q=1.0 was used 
for sort but omitted sending). They are also containing a number of 
oddities such as multiple entries for language codes with differing 
q-values.

NP: Of those 14 odd A-L headers noted above I have UA details on 8 of 
them. All claim to be Firefox but the Gecko dates do not line up with 
other info on those versions (the 11.0 was released some years before 
3.5.9 on the same OS) so the whole input is a bit suspect.


The 5 cases un-ordered list have puny-code values with no q-value being 
listed after an otherwise normal series of languages. Like so:
  "en-us,en;q=0.5,x-ns1qHkbtrt8Nhv,x-ns2E1e0Nnym7b6"

I have a few cases of q-value ordered list followed by wildcard "*" with 
no q-value. Sender obviously assuming the list is ordered.



Broken down by UA, which I started ~6 months ago at Juliens suggestion I 
have 54289 distinct UA visiting, of which;
   21756 are not sending A-L header at all
   19621 unique UA are using a single language code with no q-value
   12495 unique UA are using q-values as above.
   8 are sending only wildcard "*" or "*/*"

The remainder ~400 roughly match up with the 491 AL field-values not 
using q-values. Are older agents (Windows 98, NT, 2k stand out), agents 
sending the same language multiple times (VoilaBot variants and Safari 
there), or sending sub-language variants with the generic form last eg 
"en-GB,en", "en-US,en", "en-US,en,*" (Tablets and Mobile Safari mostly). 
Obviously assuming sorted lists even back into the Windows 98 ones.

There are also a few bots sending exactly 2 puny-code entries.


Amos

Received on Monday, 21 January 2013 01:56:51 UTC