Re: Locale/default encoding table

Ian Hickson On 09-10-14 05.04:

> On Wed, 14 Oct 2009, Leif Halvard Silli wrote:
>> So, here you *confirm* what I said: The "Mozilla corpus", is only a 
>> bunch of around 75 locales. Quite impressing, but still only 75 possible 
>> ones. Readers of the spec will not know that by "all others" you 
>> referred to that list.
>>
>> It simply isn't possible to say "all others", unless we know which one 
>> "the others" are. I gave you one possible locale "os_RU", where win-1252 
>> does not seem meaningful as a default.
> 
> I think it is a mistake to consider what is meaningful and reasonable when 
> examining what actually happens.


Mind your words: The "what happens" that I am examining is known 
as "what Ian writes".

> For instance, if a particular locale has 
> been using browsers built for a similar but not identical locale, then it 
> is likely that the content written by authors in that locale will actually 
> depend on the default encoding of the legacy surrogate locale. There are a 
> number of examples of this in the Mozilla localisations (Henri pointed to 
> a few of them).


I think I read everything in this thread. Did not see his 
examples. Where?

> So I would be quite surprised if the most useful encoding 
> to list as the default wasn't actually Win1252, despite that being 
> somewhat counter-intuitive.


I especially picked the "os_RU" locale because it is situated in 
Russia and uses Cyrillic for everything. The ossetic alphabet 
seems to be fully compatible with Windows 1251.

>> It may work for an individual user. But, it doesn't sound like someone 
>> offering a localized browser product for Ossetian users inside Russia 
>> would have much success that way.
> 
> It basically depends on what Ossetian users have been using before having 
> a dedicated localised product.

And they are using Russian today. They probably use Ossetian also. 
They just don't have a localized Firefox browser.

I cannot see that you have refuted my claim that you need to 
specify those exact locales for which you think Win 1252 should be 
the default.

Also: Above you talked about legacy surrogate locales that are 
similar but not identical. By "similar" you of course at least 
have in mind "same script". So, could explain me why browsers must 
have the following defaults?

Ian's   -        -
default - Locale - Script
--------|--------|----------------
win1252 - bn-BD  - Not Latin: Bengali Bangladesh
win1252 - bn-IN  – Not Latin: Benagli India
win1252 - el     – Not Latin: Greek
win1252 - eo     – Win1252 doesn't fully cover Esperanto
win1252 - mn     – Not Latin: 90% cyrillic users
win1252 - mr     – Not Latin: Deva script
win1252 - or     – Not Latin: Orya script
win1252 - ta     – Not Latin: Tamil script
win1252 - ta-LK  – Not Latin: (Tamil script?)
UTF-8   - cy     - Win1252 doesn't fully cover Welsh

Note that in this nice bunch of "Western demographics", comes last 
but not least Welsh! With UTF-8. Situated as it is in the midst of 
the united kingdom that spread the English alphabet to the world 
more than any other. Also note that Esperanto users defaults to 
win1252 ...

Why is it safer for Welsh to use UTF-8 as default. But not for 
those languages that doesn't use Latin at all? Also see Andrew's 
letter [1]

As I have understood it, market share is the bible here. So what 
could go wrong if one started to have, what looks as a more 
reasonable defaults, in for the encodings of that table?

Also, again: I took up Belarusian. Why does it have ISO-8859-5 as 
default? Do you just trust whatever comes out of Mozilla?

Or perhaps we aren't supposed to take that table very seriously.

[1] http://www.w3.org/mid/4AD4D3F4.5010708@xn--mlform-iua.no
-- 
leif halvard silli

Received on Wednesday, 14 October 2009 03:41:09 UTC