W3C home > Mailing lists > Public > public-html@w3.org > October 2009

Re: Locale/default encoding table

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Wed, 14 Oct 2009 19:24:20 +0200
Message-ID: <4AD60944.2000603@xn--mlform-iua.no>
To: "Phillips, Addison" <addison@amazon.com>
CC: Henri Sivonen <hsivonen@iki.fi>, Ian Hickson <ian@hixie.ch>, Geoffrey Sneddon <gsneddon@opera.com>, HTML WG <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Phillips, Addison On 09-10-14 16.18:

>> I rather suspect that UTF-8 isn't the best default for any
>> locale, since real UTF-8 content is unlikely to rely on the
>> last defaulting step for decoding. I don't know why some
>> Firefox localizations default to UTF-8.
> 
> Why do you assume that UTF-8 pages are better labeled than
> other encodings? Experience suggests otherwise :-).
> 
> Although UTF-8 is positively detectable and several of us (Mark
> Davis and I, at least) have suggested making UTF-8
> auto-detection a requirement, in fact, unless chardet is used,
> nothing causes unannounced UTF-8 to work any better than any
> other encoding.

The effect of a UTF-8 auto-detection requirement would lead to two 
defaults: UTF-8 as one default. And legacy encodings as a 
secondary default.

This sounds like an excellent idea.

This would - I suppose - make it not needed to operate with UTF-8 
as default for any locale for which there exist legacy encodings.

This, in turn, would allow us to be more accurate in picking the 
default legacy ncoding(s). E.g. for Croat, it would not be 
necessary to have UTF-8 as default legacy fallback, I suppose.

> The I18N WG pointed out that for many developing languages and
> locales, the legacy encodings are fragmented and frequently
> font-based, making UTF-8 a better default choice. This is not
> the case for a relatively well-known language such as
> Belarusian or Welsh, but it is the case for many minority and
> developing world languages.

Indeed.
-- 
leif halvard silli
Received on Wednesday, 14 October 2009 17:24:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:50 GMT