W3C home > Mailing lists > Public > public-html@w3.org > October 2009

Re: Locale/default encoding table

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 15 Oct 2009 16:04:33 +0300
Cc: Ian Hickson <ian@hixie.ch>, Geoffrey Sneddon <gsneddon@opera.com>, HTML WG <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-Id: <304415EA-4638-451D-9F81-86362A07DC9F@iki.fi>
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
On Oct 14, 2009, at 18:36, Leif Halvard Silli wrote:

> Henri Sivonen On 09-10-14 15.28:
>> On Oct 14, 2009, at 06:40, Leif Halvard Silli wrote:
>>> I especially picked the "os_RU" locale because it is situated in   
>>> Russia and uses Cyrillic for everything. The ossetic alphabet  
>>> seems  to be fully compatible with Windows 1251.
>> In that case, it would probably make sense to ship Windows-1251 as  
>> the  default for an Ossetian localization.
> Then I suppose we agree that Ian's table must not simply say that  
> "For all other locales, use Windows 1252 as default", right?

The right rule is: The default should be the (non-UTF-8?) ASCII- 
superset encoding that the expected user base of the localization is  
most frequently going to encounter as unlabeled.

The rule of defaulting to Windows-1252 when in doubt isn't a bad rule  
even if it may fail for Ossetian. (If you aren't in doubt that it  
would fail for Ossetian, don't apply the "when in doubt" rule.)

>>> win1252 - bn-BD  - Not Latin: Bengali Bangladesh
>>> win1252 - bn-IN   Not Latin: Benagli India
>> I don't have data about Bengali Web pages, but if it turns out  
>> that  most Bengali content is labeled but that users of Bengali- 
>> localized  browsers also read a lot of unlabeled English content,  
>> Windows-1252  would make sense as the default.
> But aren't English content supported by ASCII, and thus by UTF-8?

English content contains "smart" dashes and quotes.

> So *is* there any reason to have UTF-8 as default *anywhere*, other  
> than the motto "yes, let's switch to UTF-8"?

None that I can think of. I'm tentatively considering the Firefox  
localizations that default to UTF-8 to have a bug on this point. I  
guess at some point I'll file bugs on them to either get them changed  
or to discover what I'm missing.

Henri Sivonen
Received on Thursday, 15 October 2009 13:05:12 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:52 UTC