W3C home > Mailing lists > Public > public-html@w3.org > October 2009

Re: HTML5 Issue 11 (encoding detection): I18N WG response...

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 12 Oct 2009 16:49:02 +0300
To: Andrew Cunningham <andrewc@vicnet.net.au>
Message-Id: <4D87BC55-9185-47F8-BFE2-1FCD88E70B43@iki.fi>
Cc: "Maciej Stachowiak" <mjs@apple.com>, "Ian Hickson" <ian@hixie.ch>, "Leif Halvard Silli" <xn--mlform-iua@xn--mlform-iua.no>, Mark Davis ˜• <mark@macchiato.com>, "Martin_J=2E_D=FCrst" <duerst@it.aoyama.ac.jp>, "Phillips, Addison" <addison@amazon.com>, "Richard Ishida" <ishida@w3.org>, "public-html@w3.org" <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "Larry Masinter" <masinter@adobe.com>
On Oct 12, 2009, at 16:02, Andrew Cunningham wrote:

> and vietnamese is a difficult langauge, since there isn't any one  
> dominant
> legacy enncoding, would need soem sort of suto detect mechanism,  
> assuming
> web browsers actually supported all the key legacy encodings, which  
> form
> memory they don't.


The Vietnamese localization of Firefox defaults to UTF-8 and no  
heuristic detector:
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/vi/toolkit/chrome/global/intl.properties

For comparison, Japanese, Russian and Ukranian have a heuristic  
detector turned on by default:
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/ja/toolkit/chrome/global/intl.properties
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/ru/toolkit/chrome/global/intl.properties
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/uk/toolkit/chrome/global/intl.properties

(Korean, Simplified Chinese and Traditional Chinese don't, BTW.)

Query of interest:
http://mxr.mozilla.org/l10n-mozilla1.9.1/find?string=global%2Fintl.properties&tree=l10n-mozilla1.9.1&hint=

In various Indian locales, the language itself does not use the Latin  
alphabet but the default is still Windows-1252:
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/hi-IN/toolkit/chrome/global/intl.properties
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/bn-IN/toolkit/chrome/global/intl.properties
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/gu-IN/toolkit/chrome/global/intl.properties
http://mxr.mozilla.org/l10n-mozilla1.9.1/source/pa-IN/toolkit/chrome/global/intl.properties

Note that ISO-8859-1 as the default really means Windows-1252.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Monday, 12 October 2009 13:49:47 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:50 GMT