Re: Locale/default encoding table

Leif Halvard Silli wrote:
> Andrew Cunningham On 09-10-14 06.46:
>
>>
>> Leif Halvard Silli wrote:
>>
>>> So where does Windows 1252 as default for Bengali, Tamil etc fit in 
>>> here?
>>
>> a rhetorical question?
>
>
> A little bit.
>
> But you *did* in fact say to Henri that it seemed logical if Firefox, 
> for Indian languages, defaults to either *Windows 1252* or UTF-8. If 
> there is a free choice between win 1252 and UTF-8, then Win-1252 would 
> be illogical to select.
>
>
>> maybe because the dominance of English in Indian web content? And 
>> before the uptake of Unicode for Indian web content, most developers 
>> used hacks?
>
> About the "English uptake": Basic English is covered for by the ASCII 
> section of whichever encoding you choose, anyhow.
>
although if you look at legacy content in south asian and south east 
asian writing scripts, you will find, its not uncommon for the ASCII 
characters not to be present in those legacy encodings. Legacy encodings 
have a limited number of codepoints, and often there wasn't enough space 
to add ascii as a sub set. but them firefox doesn't support those legacy 
encodings.

> As for hacks: OK, I now tested the Bengali www.aajkaal.net in Windows. 
> Even there, it only works in Internet Explorer, as much as I could 
> tell. For all other browsers, then, defaulting to Windows 1251 seems 
> irrelevant. Unless they also start to support EOT fonts (which I 
> gather is what makes that page work in IE).
>
Actually would work with Netscape 4 as well ;) for some reason still 
includes PFR files.

serves up a EOT version for IE otherwise serves up a PFR version, was 
very common for Indian language websites back in the days of IE4 and 
netscape4

> Are there hacks that works for Firefox? If not, then win 1251 seems 
> like a bad default for Firefox.
>
yep, but since Firefox does not support any appropriate legacy encoding 
for that language, there is no good default.
>> Its always difficult to explain to end users why web pages in their 
>> languages don't display correctly in web browsers, or why they can't 
>> use their language in common web 2.0 or social networking 
>> environments. By end users , i mean users with limited IT knowledge.
>
> And that is why, when possible, the default encoding should be as 
> "wide" as possible.
Except browsers have such limited encoding repertoires, catch-22, I 
suspect that for a lot of new localisation projects in the future, there 
will be no acceptable fallback legacy encoding. We're already seeing it 
in the South Asian Firefox localisations, Burmese and Khmer 
localisations, some of the work going on in localisation in Africa. The 
problem seems to actual occur on a continental or sub-continental basis, 
rather than a per languages. Any language who's repertoire of necessary 
characters exists as a subset of a European legacy encoding is fine, so 
is CJK data. Everything else is in limbo, and there is legacy data out 
there that is part of that "everything else".

-- 
Andrew Cunningham
Senior Manager, Research and Development
Vicnet
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000

Ph: +61-3-8664-7430
Fax: +61-3-9639-2175

Email: andrewc@vicnet.net.au
Alt email: lang.support@gmail.com

http://home.vicnet.net.au/~andrewc/
http://www.openroad.net.au
http://www.vicnet.net.au
http://www.slv.vic.gov.au

Received on Wednesday, 14 October 2009 22:32:19 UTC