- From: Ambrose LI <ambrose.li@gmail.com>
- Date: Tue, 5 Jul 2011 04:22:40 -0400
- To: Florian Rivoal <florianr@opera.com>
- Cc: www-style@w3.org
2011/7/5 Florian Rivoal <florianr@opera.com> > > On Tue, 05 Jul 2011 16:35:06 +0900, Ambrose LI <ambrose.li@gmail.com> wrote: > >> 2011/7/5 Florian Rivoal <florianr@opera.com> >> >>> >>> The algorithm should probably be something like: >>> 1- if you have a lang attribute, use that >>> 2- otherwise, if you have an Content-Language http header, use that >>> 3- otherwise, if you have a <meta http-equiv="content-language" ...> use >>> that >>> 4- otherwise, if you have a charset specified in the http headers and that >>> charset is specific to a language (shift-jis, BG, big5, EUC-KR... the >>> list must be explicit), you're in that language >>> >> >> The problem is just that this assumption is clearly false, because bilingual >> documents exist. In fact I’d say that it’s worse than that, in the sense >> that if a site is still using a national charset, then it’s likely that even >> its English-language pages will be encoded in the national charset. >> >> So this would be a good approximation that probably works a lot of times, >> but not all of the time. > > I agree, there is no way step 4 will work all the time, but I don't think that it is a problem that it is sometimes wrong: it is a fallback that only kicks in if the reliable ways were missing. I agree. But my feeling is that most of the time the reliable ways are missing. I have the habit of using the lang attribute. But even I sometimes get sloppy and leave the spans out because it is so much work marking up every single bit of English text. If we are talking about blog comments then it’s hopeless. -- cheers, -ambrose
Received on Tuesday, 5 July 2011 08:23:07 UTC