- From: Leif Halvard Silli <lhs@malform.no>
- Date: Thu, 15 Oct 2009 10:37:20 +0200
- To: Andrew Cunningham <andrewc@vicnet.net.au>
- CC: Ian Hickson <ian@hixie.ch>, Geoffrey Sneddon <gsneddon@opera.com>, HTML WG <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Andrew Cunningham On 09-10-15 00.31: > Leif Halvard Silli wrote: [...] >>>> So where does Windows 1252 as default for Bengali, Tamil etc fit in >>>> here? [...] >> About the "English uptake": Basic English is covered for by the ASCII [...] > although if you look at legacy content in south asian and south east > asian writing scripts, you will find, its not uncommon for the ASCII > characters not to be present in those legacy encodings. Legacy encodings > have a limited number of codepoints, and often there wasn't enough space > to add ascii as a sub set. but them firefox doesn't support those legacy > encodings. So Windows 1251 here serves as an encoding that can be exploited, it has nothing to do with "Western" in any whatsoever way ... >> As for hacks: OK, I now tested the Bengali www.aajkaal.net in Windows. >> Even there, it only works in Internet Explorer, as much as I could >> tell. For all other browsers, then, defaulting to Windows 1251 seems >> irrelevant. Unless they also start to support EOT fonts (which I >> gather is what makes that page work in IE). >> > Actually would work with Netscape 4 as well ;) for some reason still > includes PFR files. EOT is included in Web Fonts. Could www.aajkaal.net start to work in Mozilla again? http://www.w3.org/TR/css3-webfonts/ [...] >>> Its always difficult to explain to end users why web pages in their >>> languages don't display correctly in web browsers, or why they can't >>> use their language in common web 2.0 or social networking >>> environments. By end users , i mean users with limited IT knowledge. >> And that is why, when possible, the default encoding should be as >> "wide" as possible. > Except browsers have such limited encoding repertoires, catch-22, So the impatient ones must either use pseudo-Unicode or exploit an existing legacy encoding ... > I > suspect that for a lot of new localisation projects in the future, there > will be no acceptable fallback legacy encoding. I now understand why you said "either Windows 1252 or UTF-8" ... > We're already seeing it > in the South Asian Firefox localisations, Burmese and Khmer > localisations, some of the work going on in localisation in Africa. The > problem seems to actual occur on a continental or sub-continental basis, > rather than a per languages. Any language who's repertoire of necessary > characters exists as a subset of a European legacy encoding is fine, so > is CJK data. Everything else is in limbo, and there is legacy data out > there that is part of that "everything else". So, it is not only Windows 1252 that gets exploited? Btw, I tested if Safari for Mac OS X defaults to different encodings based on the active locale ... The answer is no. It defaults to Windows 1252, regardless. -- leif halvard silli
Received on Thursday, 15 October 2009 08:37:57 UTC