W3C home > Mailing lists > Public > public-html@w3.org > October 2009

Re: Locale/default encoding table

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Wed, 14 Oct 2009 02:59:10 +0200
Message-ID: <4AD5225E.9030802@xn--mlform-iua.no>
To: Ian Hickson <ian@hixie.ch>
CC: Geoffrey Sneddon <gsneddon@opera.com>, HTML WG <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Ian Hickson On 09-10-14 02.11:

> On Tue, 13 Oct 2009, Geoffrey Sneddon wrote:
>> It isn't defined what the two letter code locales are: apparently they 
>> aren't ISO-3166-1 alpha-2 codes, which is just confusing.
>> Can you please define what the two letter codes are, or, ideally, just 
>> use the ISO-3166-1 alpha-2 codes (and state they are and add ref).
> On Tue, 13 Oct 2009, Leif Halvard Silli wrote:
>> Shouldn't it state or link to BCP47?
>> http://www.w3.org/blog/International/2009/10/09/updated_article_language_tags_in_html_an_1
> Done.

Some comments/questions about the table.

FIRSTLY: The "locale" "be" does it mean Belarusian ("be") or 
Belgian (also "be", though it is often written "BE"). The encoding 
is Cyrillic, so you obviously meant Belarusian. But then, why do 
you use the   _country tag_ (UA)for Ukraine/Ukrainian? Same with 
"ar". I suppose you meant Arabic, because I don't think Argentine 
requires UTF-8 as default.

Another example: What does "ru" mean? Russian? Or Russia? I 
thought it meant "Russian", until I discovered this lack of 

Most commonly, a _locale_ is tagged using a combination of 
language_country. E.g. "en_US". I would like to see the same thing 
here, in some sort.

Also, it is customary - though not required (but it would be nice 
to do it here)) - to put the country subtag in UPPERCASE.

I don't know if it could an idea to make the table like this 
(example for "be_BY"):

Language | Country | BCP47 script info | Default encoding
   be     |   BT    | suppress-script   | windows-1251

SECONDLY: The tables ends by saying "All other locales => Windows 
1252". But I think that is impossible to say. One can define an 
endless range of locales. And, in fact, new locales are "invented" 
often. For example, a new language tag for a dialect/language in 
Estonia was registered not so long ago.

It seems to me that when you you say "all others", then you should 
operate with a fixed list of known locales. For example, if 
Firefox was localized for the locale "os_RU" (Ossetic language in 
the Russian Federation), then I don't think it should default to 
"Windows-1252" ...

My other questions/comments must perhaps wait until you have 
clarified what the tags in the table means.
leif halvard silli
Received on Wednesday, 14 October 2009 00:59:48 UTC

This archive was generated by hypermail 2.4.0 : Saturday, 9 October 2021 18:45:00 UTC