Re: Locale/default encoding table

On Oct 14, 2009, at 06:40, Leif Halvard Silli wrote:

> I especially picked the "os_RU" locale because it is situated in  
> Russia and uses Cyrillic for everything. The ossetic alphabet seems  
> to be fully compatible with Windows 1251.

In that case, it would probably make sense to ship Windows-1251 as the  
default for an Ossetian localization.

> win1252 - bn-BD  - Not Latin: Bengali Bangladesh
> win1252 - bn-IN  – Not Latin: Benagli India

I don't have data about Bengali Web pages, but if it turns out that  
most Bengali content is labeled but that users of Bengali-localized  
browsers also read a lot of unlabeled English content, Windows-1252  
would make sense as the default.

> UTF-8   - cy     - Win1252 doesn't fully cover Welsh

I seems very plausible that users of a Welsh browser UI read a lot of  
English content. If it happens that Welsh content is labeled and the  
English content is what's unlabeled, Windows-1252 would make sense as  
the default.

This isn't about what encoding covers the language of the  
localization. This is about what's the most common unlabeled encoding  
that the users of a particular localization encounter.

> Why is it safer for Welsh to use UTF-8 as default.

I rather suspect that UTF-8 isn't the best default for any locale,  
since real UTF-8 content is unlikely to rely on the last defaulting  
step for decoding. I don't know why some Firefox localizations default  
to UTF-8.

> Also, again: I took up Belarusian. Why does it have ISO-8859-5 as  
> default?

I filed a bug on this, FWIW. Maybe "why" is answered in the bug report  
in due course:
https://bugzilla.mozilla.org/show_bug.cgi?id=522218

> Do you just trust whatever comes out of Mozilla?

It would be helpful to dig up data on how Microsoft configures IE by  
default in various locales. And Opera if Opera varies the default by  
locale.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Wednesday, 14 October 2009 13:29:15 UTC