Re: Locale/default encoding table from Leif Halvard Silli on 2009-10-15 (public-html@w3.org from October 2009)

From: Leif Halvard Silli <lhs@malform.no>
Date: Thu, 15 Oct 2009 10:37:20 +0200
To: Andrew Cunningham <andrewc@vicnet.net.au>
CC: Ian Hickson <ian@hixie.ch>, Geoffrey Sneddon <gsneddon@opera.com>, HTML WG <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <4AD6DF40.9070509@malform.no>

Andrew Cunningham On 09-10-15 00.31:

> Leif Halvard Silli wrote:

   [...]

>>>> So where does Windows 1252 as default for Bengali, Tamil etc fit in 
>>>> here?
  [...]

>> About the "English uptake": Basic English is covered for by the ASCII 

   [...]

> although if you look at legacy content in south asian and south east 
> asian writing scripts, you will find, its not uncommon for the ASCII 
> characters not to be present in those legacy encodings. Legacy encodings 
> have a limited number of codepoints, and often there wasn't enough space 
> to add ascii as a sub set. but them firefox doesn't support those legacy 
> encodings.

So Windows 1251 here serves as an encoding that can be exploited, 
it has nothing to do with "Western" in any whatsoever way ...

>> As for hacks: OK, I now tested the Bengali www.aajkaal.net in Windows. 
>> Even there, it only works in Internet Explorer, as much as I could 
>> tell. For all other browsers, then, defaulting to Windows 1251 seems 
>> irrelevant. Unless they also start to support EOT fonts (which I 
>> gather is what makes that page work in IE).
>>
> Actually would work with Netscape 4 as well ;) for some reason still 
> includes PFR files.


EOT is included in Web Fonts. Could www.aajkaal.net start to work 
in Mozilla again? http://www.w3.org/TR/css3-webfonts/

[...]

>>> Its always difficult to explain to end users why web pages in their 
>>> languages don't display correctly in web browsers, or why they can't 
>>> use their language in common web 2.0 or social networking 
>>> environments. By end users , i mean users with limited IT knowledge.
>> And that is why, when possible, the default encoding should be as 
>> "wide" as possible.
> Except browsers have such limited encoding repertoires, catch-22,


So the impatient ones must either use pseudo-Unicode or exploit an 
existing legacy encoding ...

> I 
> suspect that for a lot of new localisation projects in the future, there 
> will be no acceptable fallback legacy encoding.


I now understand why you said "either Windows 1252 or UTF-8" ...

> We're already seeing it 
> in the South Asian Firefox localisations, Burmese and Khmer 
> localisations, some of the work going on in localisation in Africa. The 
> problem seems to actual occur on a continental or sub-continental basis, 
> rather than a per languages. Any language who's repertoire of necessary 
> characters exists as a subset of a European legacy encoding is fine, so 
> is CJK data. Everything else is in limbo, and there is legacy data out 
> there that is part of that "everything else".

So, it is not only Windows 1252 that gets exploited?

Btw, I tested if Safari for Mac OS X defaults to different 
encodings based on the active locale ... The answer is no. It 
defaults to Windows 1252, regardless.
-- 
leif halvard silli

Received on Thursday, 15 October 2009 08:37:57 UTC