W3C home > Mailing lists > Public > public-html@w3.org > October 2009

Re: HTML5 Issue 11 (encoding detection): I18N WG response...

From: Andrew Cunningham <andrewc@vicnet.net.au>
Date: Tue, 13 Oct 2009 00:12:39 +1000
Message-ID: <6a5c4f5f14df0782de3d9f2759f28fa6.squirrel@mail.vicnet.net.au>
To: "Henri Sivonen" <hsivonen@iki.fi>
Cc: "Andrew Cunningham" <andrewc@vicnet.net.au>, "Maciej Stachowiak" <mjs@apple.com>, "Ian Hickson" <ian@hixie.ch>, "Leif Halvard Silli" <xn--mlform-iua@xn--mlform-iua.no>, Mark Davis ˜• <mark@macchiato.com>, "Martin_J=2E_D=FCrst" <duerst@it.aoyama.ac.jp>, "Phillips, Addison" <addison@amazon.com>, "Richard Ishida" <ishida@w3.org>, "public-html@w3.org" <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "Larry Masinter" <masinter@adobe.com>
Thanks Henri, greatly appreciated. Useful data.

Will be interesting to see what the trend will be in the future as the
localisation effort builds up steam.

although begs the question as to what happens with legacy encoded data in
those languages, and with Vietnamese i'm still seeing bloggers using VNI,
so still some content being produced in that encoding even today.

not surprised with russian, japanese and ukranian, since legacy data may
be in a few differnet encodings so heuristics makes sense.

also not surprised by the indian localisations, had to be either utf-8 or
win-1252. and guess win-1252 is a logical choice since firefox doesn't
really support legacy encodings for Indian languages, and good percentage
of legacy content in indian languages is misidentifying itself as
iso-8859-1 or windows-1252 and relying on styling.

On Mon, October 12, 2009 23:49, Henri Sivonen wrote:

> The Vietnamese localization of Firefox defaults to UTF-8 and no
> heuristic detector:
> http://mxr.mozilla.org/l10n-mozilla1.9.1/source/vi/toolkit/chrome/global/intl.properties
> For comparison, Japanese, Russian and Ukranian have a heuristic
> detector turned on by default:
> http://mxr.mozilla.org/l10n-mozilla1.9.1/source/ja/toolkit/chrome/global/intl.properties
> http://mxr.mozilla.org/l10n-mozilla1.9.1/source/ru/toolkit/chrome/global/intl.properties
> http://mxr.mozilla.org/l10n-mozilla1.9.1/source/uk/toolkit/chrome/global/intl.properties
> (Korean, Simplified Chinese and Traditional Chinese don't, BTW.)
> Query of interest:
> http://mxr.mozilla.org/l10n-mozilla1.9.1/find?string=global%2Fintl.properties&tree=l10n-mozilla1.9.1&hint=
> In various Indian locales, the language itself does not use the Latin
> alphabet but the default is still Windows-1252:
> http://mxr.mozilla.org/l10n-mozilla1.9.1/source/hi-IN/toolkit/chrome/global/intl.properties
> http://mxr.mozilla.org/l10n-mozilla1.9.1/source/bn-IN/toolkit/chrome/global/intl.properties
> http://mxr.mozilla.org/l10n-mozilla1.9.1/source/gu-IN/toolkit/chrome/global/intl.properties
> http://mxr.mozilla.org/l10n-mozilla1.9.1/source/pa-IN/toolkit/chrome/global/intl.properties

Andrew Cunningham
Research and Development Coordinator
State Library of Victoria

Received on Monday, 12 October 2009 14:13:19 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:52 UTC