Re: Locale/default encoding table

On Oct 14, 2009, at 18:36, Leif Halvard Silli wrote:

> Henri Sivonen On 09-10-14 15.28:
>
>> On Oct 14, 2009, at 06:40, Leif Halvard Silli wrote:
>>> I especially picked the "os_RU" locale because it is situated in   
>>> Russia and uses Cyrillic for everything. The ossetic alphabet  
>>> seems  to be fully compatible with Windows 1251.
>> In that case, it would probably make sense to ship Windows-1251 as  
>> the  default for an Ossetian localization.
>
> Then I suppose we agree that Ian's table must not simply say that  
> "For all other locales, use Windows 1252 as default", right?

The right rule is: The default should be the (non-UTF-8?) ASCII- 
superset encoding that the expected user base of the localization is  
most frequently going to encounter as unlabeled.

The rule of defaulting to Windows-1252 when in doubt isn't a bad rule  
even if it may fail for Ossetian. (If you aren't in doubt that it  
would fail for Ossetian, don't apply the "when in doubt" rule.)

>>> win1252 - bn-BD  - Not Latin: Bengali Bangladesh
>>> win1252 - bn-IN  – Not Latin: Benagli India
>> I don't have data about Bengali Web pages, but if it turns out  
>> that  most Bengali content is labeled but that users of Bengali- 
>> localized  browsers also read a lot of unlabeled English content,  
>> Windows-1252  would make sense as the default.
>
> But aren't English content supported by ASCII, and thus by UTF-8?

English content contains "smart" dashes and quotes.

> So *is* there any reason to have UTF-8 as default *anywhere*, other  
> than the motto "yes, let's switch to UTF-8"?

None that I can think of. I'm tentatively considering the Firefox  
localizations that default to UTF-8 to have a bug on this point. I  
guess at some point I'll file bugs on them to either get them changed  
or to discover what I'm missing.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Thursday, 15 October 2009 13:05:11 UTC