- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Wed, 7 Dec 2011 09:45:11 +0200
On Mon, Dec 5, 2011 at 7:42 PM, Leif Halvard Silli <xn--mlform-iua at xn--mlform-iua.no> wrote: > Last I checked, some of those locales defaulted to UTF-8. (And HTML5 > defines it the same.) So how is that possible? Don't users of those > locales travel as much as you do? Or do we consider the English locale > user's as more important? Something is broken in the logics here! Mozilla grants localizers a lot of latitude here. The defaults you see are not carefully chosen by a committee of encoding strategists doing whole-Web optimization at Mozilla. They are chosen by individual localizers. Looking at which locales default to UTF-8, I think the most probable explanation is that the localizers mistakenly tried to pick an encoding that fits the language of the localization instead of picking an encoding that's the most successful at decoding unlabeled pages most likely read by users of the localization (which means *other-language* pages when the language of the localization doesn't have a pre-UTF-8 legacy). I think that defaulting to UTF-8 is always a bug, because at the time these localizations were launched, there should have been no unlabeled UTF-8 legacy, because up until these locales were launched, no browsers defaulted to UTF-8 (broadly speaking). I think defaulting to UTF-8 is harmful, because it makes it possible for locale-siloed unlabeled UTF-8 content come to existence (instead of guiding all Web authors always to declare their use of UTF-8 so that the content works with all browser locale configurations). I have tried to lobby internally at Mozilla for stricter localizer oversight here but have failed. (I'm particularly worried about localizers turning the heuristic detector on by default for their locale when it's not absolutely needed, because that's actually performance-sensitive and less likely to be corrected by the user. Therefore, turning the heuristic detector on may do performance reputation damage. ) (Note that zh-TW seems to be an exception to general observation that the locale's language has no browser-supported legacy encoding. However, zh-TW enables the universal heuristic encoding detector by default, so the fallback encoding matters less.) -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/
Received on Tuesday, 6 December 2011 23:45:11 UTC