2011/7/5 Florian Rivoal <florianr@opera.com> > > The algorithm should probably be something like: > 1- if you have a lang attribute, use that > 2- otherwise, if you have an Content-Language http header, use that > 3- otherwise, if you have a <meta http-equiv="content-language" ...> use > that > 4- otherwise, if you have a charset specified in the http headers and that > charset is specific to a language (shift-jis, BG, big5, EUC-KR... the > list must be explicit), you're in that language > The problem is just that this assumption is clearly false, because bilingual documents exist. In fact I’d say that it’s worse than that, in the sense that if a site is still using a national charset, then it’s likely that even its English-language pages will be encoded in the national charset. So this would be a good approximation that probably works a lot of times, but not all of the time. 5- same as 4, but with a meta tag, rather than an http header > 6- otherwise, you don't know > > -- cheers, -ambroseReceived on Tuesday, 5 July 2011 07:35:33 GMT
This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 17:20:42 GMT