On Oct 14, 2009, at 17:18, Phillips, Addison wrote: >> >> I rather suspect that UTF-8 isn't the best default for any locale, >> since real UTF-8 content is unlikely to rely on the last defaulting >> step for decoding. I don't know why some Firefox localizations >> default to UTF-8. > > Why do you assume that UTF-8 pages are better labeled than other > encodings? Because most of the global browser installed base (including en-US browsers deployed around the world) doesn't default to UTF-8 and defaults to chardet off, UTF-8 doesn't work right unless labeled or unless user takes action. It seems to me that unlabeled UTF-8 could only work out-of-the-box for two reasons: 1) Defaulting to UTF-8 in a given locale letting authors in that locale be sloppy and not label their encodings. (BOM counts as a label.) In this scenario, it's not about an age-old legacy but the locale-specific default generating a new legacy. (For this reason, I think it's rather questionable to ship UTF-8-defaulting browsers to any locale.) 2) A heuristic detector that supports UTF-8 defaulting on in the locale. However, the locales where a detector defaults on (Russian, Ukranian, Japanese), the legacy is well-known not to be predominantly UTF-8. (The Swedish localization of Firefox also defaults to a detector on by default, but that's clearly bogus.) -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/Received on Thursday, 15 October 2009 12:58:03 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 15 October 2009 12:58:04 GMT