- From: Florian Rivoal <florianr@opera.com>
- Date: Tue, 05 Jul 2011 17:07:31 +0900
- To: "Ambrose LI" <ambrose.li@gmail.com>
- Cc: www-style@w3.org
On Tue, 05 Jul 2011 16:35:06 +0900, Ambrose LI <ambrose.li@gmail.com> wrote: > 2011/7/5 Florian Rivoal <florianr@opera.com> > >> >> The algorithm should probably be something like: >> 1- if you have a lang attribute, use that >> 2- otherwise, if you have an Content-Language http header, use that >> 3- otherwise, if you have a <meta http-equiv="content-language" ...> use >> that >> 4- otherwise, if you have a charset specified in the http headers and >> that >> charset is specific to a language (shift-jis, BG, big5, EUC-KR... the >> list must be explicit), you're in that language >> > > The problem is just that this assumption is clearly false, because > bilingual > documents exist. In fact I’d say that it’s worse than that, in the sense > that if a site is still using a national charset, then it’s likely that > even > its English-language pages will be encoded in the national charset. > > So this would be a good approximation that probably works a lot of times, > but not all of the time. I agree, there is no way step 4 will work all the time, but I don't think that it is a problem that it is sometimes wrong: it is a fallback that only kicks in if the reliable ways were missing. So the question is not whether or not it is a reliable way to detect the language. It clearly isn't. The question is: if you detect the language that way, will language dependent settings like glyph orientation have a higher chance of being correct than if we just considered the language unknown. I think there is a chance that the answer is yes, but if not, or if it is impossible to determine, I have no problem dropping this step. - Florian
Received on Tuesday, 5 July 2011 08:07:45 UTC