Re: HTML5 Issue 11 (encoding detection): I18N WG response...

On Oct 12, 2009, at 4:45 AM, Ian Hickson wrote:

>
> On Mon, 12 Oct 2009, Maciej Stachowiak wrote:
>> On Oct 11, 2009, at 12:23 PM, Ian Hickson wrote:
>>>
>>> What phrase best approximates the areas of the world where _today_  
>>> UAs
>>> are shipping with a 1252 default encoding?
>>
>> "locales that predominantly use the Latin script"
>
> Given that 1252 is the Latin script, and seem circular.

No, 1252 is one of many character set encodings for the Latin script,  
it is not "the Latin script". But even if I'd said "locales that  
predominantly use Windows-1252", that still would not be circular.  
Such a statement would make Windows-1252 a browser's default Web  
encoding for a locale based on its prevalence as the most popular  
encoding  for content in that locale. In other words, A --> B, where A  
= "this locale predominantly uses the Windows-1252 encoding for text"  
and B = "a browser makes Windows-1252 the default charset encoding for  
HTML in this locale". I hope it's clear that A != B.

That being said, I get the impression the choice is actually made  
based on whether Windows-1252 covers the alphabet of the primary  
language of the locale, as opposed to purely  pre-existing use, so  
that's what I proposed in my extended statement.

>> Or you could say:
>>
>> "locales that predominantly use the Latin script, and whose primary
>> languages are completely or almost completely covered by  
>> Windows-1252."
>
> I'd rather just have an explicit table, if we can.

That sounds like a reasonable approach.

Mozilla's choices for these are publicly available. Safari (and I  
think other WebKit-based browsers, though I'm not positive) uses  
Windows-1252 as the default for everything. We believe this is a bad  
choice, however, for users in some locales (for example, Russian,  
Chinese and Japanese locales). Users in those locales often find they  
need to change the default.

We're willing to converge on a specified behavior for this.

Regards,
Maciej

Received on Monday, 12 October 2009 12:00:57 UTC