Re: Locale/default encoding table

On Oct 14, 2009, at 17:18, Phillips, Addison wrote:

>> I rather suspect that UTF-8 isn't the best default for any locale,
>> since real UTF-8 content is unlikely to rely on the last defaulting
>> step for decoding. I don't know why some Firefox localizations
>> default to UTF-8.
> Why do you assume that UTF-8 pages are better labeled than other  
> encodings?

Because most of the global browser installed base (including en-US  
browsers deployed around the world) doesn't default to UTF-8 and  
defaults to chardet off, UTF-8 doesn't work right unless labeled or  
unless user takes action.

It seems to me that unlabeled UTF-8 could only work out-of-the-box for  
two reasons:

  1) Defaulting to UTF-8 in a given locale letting authors in that  
locale be sloppy and not label their encodings. (BOM counts as a  
label.) In this scenario, it's not about an age-old legacy but the  
locale-specific default generating a new legacy. (For this reason, I  
think it's rather questionable to ship UTF-8-defaulting browsers to  
any locale.)

  2) A heuristic detector that supports UTF-8 defaulting on in the  
locale. However, the locales where a detector defaults on (Russian,  
Ukranian, Japanese), the legacy is well-known not to be predominantly  
UTF-8. (The Swedish localization of Firefox also defaults to a  
detector on by default, but that's clearly bogus.)

Henri Sivonen

Received on Thursday, 15 October 2009 12:58:05 UTC