Re: HTML5 Issue 11 (encoding detection): I18N WG response...

On Oct 12, 2009, at 16:02, Andrew Cunningham wrote:

> and vietnamese is a difficult langauge, since there isn't any one  
> dominant
> legacy enncoding, would need soem sort of suto detect mechanism,  
> assuming
> web browsers actually supported all the key legacy encodings, which  
> form
> memory they don't.


The Vietnamese localization of Firefox defaults to UTF-8 and no  
heuristic detector:
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/vi/toolkit/chrome/global/intl.properties

For comparison, Japanese, Russian and Ukranian have a heuristic  
detector turned on by default:
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/ja/toolkit/chrome/global/intl.properties
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/ru/toolkit/chrome/global/intl.properties
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/uk/toolkit/chrome/global/intl.properties

(Korean, Simplified Chinese and Traditional Chinese don't, BTW.)

Query of interest:
http://mxr.mozilla.org/l10n-mozilla1.9.1/find?string=global%2Fintl.properties&tree=l10n-mozilla1.9.1&hint=

In various Indian locales, the language itself does not use the Latin  
alphabet but the default is still Windows-1252:
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/hi-IN/toolkit/chrome/global/intl.properties
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/bn-IN/toolkit/chrome/global/intl.properties
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/gu-IN/toolkit/chrome/global/intl.properties
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/pa-IN/toolkit/chrome/global/intl.properties

Note that ISO-8859-1 as the default really means Windows-1252.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Monday, 12 October 2009 13:49:46 UTC