- From: Amos Jeffries <squid3@treenet.co.nz>
- Date: Thu, 24 Jan 2013 21:37:24 +1300
- To: ietf-http-wg@w3.org
On 23/01/2013 2:53 a.m., Julian Reschke wrote: > On 2013-01-22 14:40, Nicholas Shanks wrote: >> On 17 January 2013 09:14, Julian Reschke wrote: >>> On 2013-01-17 09:59, Roy T. Fielding wrote: >>>> than there are servers that implement language negotiation and >>>> actually want to resolve ties at random. >>> >>> They do not "want" to resolve at random; they do so because they have >>> implemented what the spec says. There's no reason to create an >>> ordered list >>> structure when the spec says that an unordered list is sufficient. >> >> I think no implication of randomness should be permitted by the >> specifications. >> They should instead require that a deterministic process be used, and >> that, other than requests to services which explicitly exist to >> provide random results (e.g. Wikipedia's "Random Page" link), the same >> request should generate the same result providing nothing pertinent to >> the resource has changed on the server. >> >> Someone, I don't recall who, gave the example of a home page loading >> blog posts via AJAX, where the blog posts are available in two >> languages. Random selection between the variants, where (q * qs) >> values are equal for both languages, or are being ignored, would That would be me. Take a note of the Androids below... > > Can you please give an example of clients sending these kind of header > field values? > > Clients that care can provide different qvalues, and as a matter of > fact, they do. Uhm. Lets see..... where shall I start ? I think an overview of what happens what agents "care" enough to send q-values. Followed by a small sample of the 513 agents I have on record with no q-values at all. Judge for yourself which ones are interpreted better as sorted lists. For starters I would like to say, that to be completely fair the majority of agents that I have on record (~54% of unique language:agent pair entries) *do* send q-values properly in accordance with the specification - and that same 54% of unique agent entries is all 'voting' for the list to be ordered. I am presenting this sub-set as what types of complexity/confusion issues we are introducing when we rely solely on q-values to provide ordering semantics in the list. WebKit ... cs, en-us; 0.9, de-de; 0.8, ru-ru; 0.7 - Mozilla/5.0 (X11; U; Linux; cs-CZ) AppleWebKit/532.4 (KHTML, like Gecko) Arora/0.10.1 Safari/532.4 + do we consider that a list with q-values or not? + notice also how it is a much more "up to date" version the the following... en;q=1.0, en;q=0.5, zh-cn, zh;q=0.5, en;q=0.5 - Mozilla/5.0 (SymbianOS/9.2; U; Series60/3.1 NokiaE71-1/300.21.012; Profile/MIDP-2.0 Configuration/CLDC-1.1 ) AppleWebKit/413 (KHTML, like Gecko) Safari/413 + Nokia Symbian and SonyEricsson WebKit/ 4XX-532 derived agents across the board seem to have 1 primary language set at q=1.0 followed by a list of others all sharing q=0.5 or no q-value at all as seen above. cs-CZ, en-US - Mozilla/5.0 (Linux; U; Android 2.2; cs-cz; HTC Legend Build/FRF91) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 + Starting with WebKit/533 all the mobiles seem to have moved to this 2-language model with something then "en-US" da-DK, en-US - Mozilla/5.0 (Linux; U; Android 4.0.4; da-dk; GT-P5110 Build/IMM76D) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Safari/534.30 en-us,en - Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; Valve Steam Client; ) AppleWebKit/534.1 (KHTML, like Gecko) Chrome/6.0.444.0 Safari/534.1 th-TH, en-US - Mozilla/5.0 (Linux; U; Android 4.0.3; th-th; A1 Build/IML74K) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 ... and then we have iTunes. A massive "WTF?" going out to the iTunes developers if anyone is reading. en;q=1.0,fr;q=1.0,de;q=0.9,ja;q=0.9,nl;q=0.9,it;q=0.9,es;q=0.8,pt;q=0.8,pt-PT;q=0.8,da;q=0.7,fi;q=0.7,nb;q=0.7,sv;q=0.7,ko;q=0.6,zh-Hans;q=0.6,zh-Hant;q=0.6,ru;q =0.5,pl;q=0.5,tr;q=0.5,uk;q=0.5,ar;q=0.4,hr;q=0.4,cs;q=0.4,el;q=0.3,he;q=0.3,ro;q=0.3,sk;q=0.3,th;q=0.2,id;q=0.2,ms;q=0.2,en-GB;q=0.1,ca;q=0.1,hu;q=0.1,vi;q=0.1 - iTunes-iPad/5.1.1 (2; 32GB; dt:74) en;q=1.0,fr;q=1.0,de;q=0.9,ja;q=0.9,nl;q=0.9,it;q=0.9,es;q=0.8,pt;q=0.8,pt-PT;q=0.8,da;q=0.7,fi;q=0.7,nb;q=0.7,sv;q=0.7,ko;q=0.6,zh-Hans;q=0.6,zh-Hant;q=0.6,ru;q =0.5,pl;q=0.5,tr;q=0.5,uk;q=0.5,ar;q=0.4,hr;q=0.4,cs;q=0.4,el;q=0.3,he;q=0.3,ro;q=0.3,sk;q=0.3,th;q=0.2,id;q=0.2,ms;q=0.2,en-GB;q=0.1,ca;q=0.1,hu;q=0.1,vi;q=0.1 - iTunes-iPhone/5.0 (4; 16GB) en;q=1.0,fr;q=1.0,de;q=0.9,ja;q=0.9,nl;q=0.9,it;q=0.9,es;q=0.8,pt;q=0.8,pt-PT;q=0.8,da;q=0.7,fi;q=0.7,nb;q=0.7,sv;q=0.7,ko;q=0.6,zh-Hans;q=0.6,zh-Hant;q=0.6,ru;q =0.5,pl;q=0.5,tr;q=0.5,uk;q=0.5,ar;q=0.4,hr;q=0.4,cs;q=0.4,el;q=0.3,he;q=0.3,ro;q=0.3,sk;q=0.3,th;q=0.2,id;q=0.2,ms;q=0.2,en-GB;q=0.1,ca;q=0.1,hu;q=0.1,vi;q=0.1 - iTunes-iPhone/4.3.5 (3; 16GB) ... spiders are mostly doing a remarkably good job. At least it looks that way until the q-values get involved. ja-JP,ja - Baiduspider+(+http://www.baidu.jp/spider/) ja,en - Mozilla/5.0 (compatible; Steeler/3.5; http://www.tkl.iis.u-tokyo.ac.jp/~crawler/) ru, uk;q=0.8, be;q=0.8, en;q=0.7, *;q=0.01 - Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots) + q=0.8 - Ukranian or Belarusian ? en-us,en-gb,en;q=0.99,*;q=0.01 - TosCrawler/Nutch-1.5.1 (http://www.toshiba.co.jp/rdc/about/crawl_info.htm; <dc-crawler at ml dot toshiba dot co dot jp>) + q=1.0 - English US or British? (no so much trouble for humans but for a search engine it might cause indexing trouble). Don't know if you would call some of the major search engine bots popular or even "fixable problem"? I host a translation server so it is likely that these below are from actual users working on text translation. You know, the kind of person who *really* objects to getting a randomly-wrong language displayed. Also these people are highly knowledgeable about language codes and what they mean, so if they entered these manually it was for a specific reason according to how they or their tools author interpreted the Accept-Language specs. Note how the first entries have no q-value and are *sorted* as if they were q=1.0, which is what the spec says to do when no q-value is supplied remember ... Treat it as q=1.0. ca,ca-ES,es-es;q=0.9,es;q=0.9,en-US;q=0.9,en;q=0.9,es-419;q=0.8,ca-AD;q=0.8,en-gb;q=0.8,de-de;q=0.7,de;q=0.7,ca-CA;q=0.7,cs-CZ;q=0.6,cs;q=0.6,it-it;q=0.6,it;q=0.6,es-CL;q=0.5,en-au;q=0.5,fr-FR;q=0.5,fr;q=0.4,ru-ru;q=0.4,ru;q=0.4,es-x-mtfrom-en;q=0.4,es-ar;q=0.3,ja-JP;q=0.3,ja;q=0.3,pt-PT;q=0.2,pt;q=0.2,do-es;q=0.2,do;q=0.1,es-x-mtfrom-it;q=0.1,nl-nl;q=0.1,nl;q=0.1,en-en;q=0.0 - Mozilla/5.0 (X11; Linux x86_64; rv:10.0.6) Gecko/20100101 Firefox/10.0.6 Iceweasel/10.0.6 + q=1.0 - Catalan Valencian or Spanish Catalan? + q=0.9 - Spanish or English? Generic or nationalized grammar? + q=0.8 - Spanish or Catalan Andoran or English or German or Catalan Valencian? + q=0.6 - want to try again with German or Catalan Generic? + q=0.5 - Spanish or Australian English or French? + q=0.4 - what about French or Russian? + q=0.3 - Argentine Spanish or Japanese? + q=0.1 - Spanish or Dutch? de,de-DE,en-US;q=0.9,en;q=0.9,nl-nl;q=0.8,nl;q=0.8,en-gb;q=0.8,ro-RO;q=0.7,ro;q=0.7,fr-FR;q=0.6,fr;q=0.6,de-DE-1901;q=0.5,tr-TR;q=0.5,tr;q=0.5,pl-PL;q=0.4,pl;q=0.4,nl-NL;q=0.3,de-de;q=0.3,de-at;q=0.3,en-us;q=0.2,pl-pl;q=0.2,de;q=0.1,en-us;q=0.1,en;q=0.0 - Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.15) Gecko/20110303 Firefox/3.6.15 + q=0.9 - English Generic or US-centric ? + q=0.8 - Dutch or English? + q=0.5 - German or Turkish? + q=0.3 - Dutch or German? + q=0.2 - English or Polish? + q=0.1 - German or English? + q=0.1 - oops Cancel that q=0.9 US English option. + q=0.0 - oops Cancel that q=0.9 generic English option. + I skip q=1.0 (none), q=0.7, q=0.6 and q=0.4 because these, while being alternatives sharing a q-value, are in the ISO definitions semantically equivalent aliases for the same language. So any selection algorithm other than if-it-exists is a waste of CPU cycles but not a user problem. We have only a few agents sending "q=1.0", by my interpretation of 2616 these few are the "correct" users of q-values when q=1: en;q=1.0 - w3m/0.5.2 also the YoudaoBot spider with a mix of language codes. It seems to be trying to fetch different translations specifically for some reason. en-us;q=1.0, es-ve;q=0.5 - Mozilla/4.1 (U; BREW 3.1.5; en-US; Teleca/Q05A/INT) - NetFront/3.5.1 (BREW 5.0.1.2; U; en-us; LG; NetFront/3.5.1/AMB) Sprint LN510 MMP/2.0 Profile/MIDP-2.1 Configuration/CLDC-1.1 there are a few other variations of this "NetFront/" framework from Samsung and LG mobile devices. The rest (~50 unique agent:language pairs) using q=1.0 somewhere in the A-L header are all WebKit derived agents. We already covered how well they handle q-values. Still a fair few browser few browser agents around with no q-values. zh-cn,zh-tw - Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1 zh-cn,zh-tw - Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 en,zh,fr,de,it - Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.20) Gecko/20081217 Firefox/2.0.0.20 Novarra-Vision/8.0 ru, en-US, en - Mozilla/5.0 (compatible; Konqueror/4.4; Linux) KHTML/4.4.5 (like Gecko) ru, uk, en-US, en - Mozilla/5.0 (compatible; Konqueror/4.4; FreeBSD) KHTML/4.4.3 (like Gecko) HTH Amos
Received on Thursday, 24 January 2013 08:38:08 UTC