Re: [css3-writing-modes] bidi-style resolution of punctuation orientation from Florian Rivoal on 2011-07-05 (www-style@w3.org from July 2011)

From: Florian Rivoal <florianr@opera.com>
Date: Tue, 05 Jul 2011 17:07:31 +0900
To: "Ambrose LI" <ambrose.li@gmail.com>
Cc: www-style@w3.org
Message-ID: <op.vx4zit0c4p7avi@eeeflorian>

On Tue, 05 Jul 2011 16:35:06 +0900, Ambrose LI <ambrose.li@gmail.com>  
wrote:

> 2011/7/5 Florian Rivoal <florianr@opera.com>
>
>>
>> The algorithm should probably be something like:
>> 1- if you have a lang attribute, use that
>> 2- otherwise, if you have an Content-Language http header, use that
>> 3- otherwise, if you have a <meta http-equiv="content-language" ...> use
>>   that
>> 4- otherwise, if you have a charset specified in the http headers and  
>> that
>>   charset is specific to a language (shift-jis, BG, big5, EUC-KR... the
>>   list must be explicit), you're in that language
>>
>
> The problem is just that this assumption is clearly false, because  
> bilingual
> documents exist. In fact I’d say that it’s worse than that, in the sense
> that if a site is still using a national charset, then it’s likely that  
> even
> its English-language pages will be encoded in the national charset.
>
> So this would be a good approximation that probably works a lot of times,
> but not all of the time.

I agree, there is no way step 4 will work all the time, but I don't think  
that it is a problem that it is sometimes wrong: it is a fallback that  
only kicks in if the reliable ways were missing.

So the question is not whether or not it is a reliable way to detect the  
language. It clearly isn't. The question is: if you detect the language  
that way, will language dependent settings like glyph orientation have a  
higher chance of being correct than if we just considered the language  
unknown.

I think there is a chance that the answer is yes, but if not, or if it is  
impossible to determine, I have no problem dropping this step.

  - Florian

Received on Tuesday, 5 July 2011 08:07:45 UTC