- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Thu, 15 Oct 2009 15:55:14 +0300
- To: "Phillips, Addison" <addison@amazon.com>
- Cc: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, Ian Hickson <ian@hixie.ch>, Geoffrey Sneddon <gsneddon@opera.com>, HTML WG <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
On Oct 14, 2009, at 17:18, Phillips, Addison wrote: >> >> I rather suspect that UTF-8 isn't the best default for any locale, >> since real UTF-8 content is unlikely to rely on the last defaulting >> step for decoding. I don't know why some Firefox localizations >> default to UTF-8. > > Why do you assume that UTF-8 pages are better labeled than other > encodings? Because most of the global browser installed base (including en-US browsers deployed around the world) doesn't default to UTF-8 and defaults to chardet off, UTF-8 doesn't work right unless labeled or unless user takes action. It seems to me that unlabeled UTF-8 could only work out-of-the-box for two reasons: 1) Defaulting to UTF-8 in a given locale letting authors in that locale be sloppy and not label their encodings. (BOM counts as a label.) In this scenario, it's not about an age-old legacy but the locale-specific default generating a new legacy. (For this reason, I think it's rather questionable to ship UTF-8-defaulting browsers to any locale.) 2) A heuristic detector that supports UTF-8 defaulting on in the locale. However, the locales where a detector defaults on (Russian, Ukranian, Japanese), the legacy is well-known not to be predominantly UTF-8. (The Swedish localization of Firefox also defaults to a detector on by default, but that's clearly bogus.) -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Thursday, 15 October 2009 12:58:05 UTC