- From: Larry Masinter <masinter@adobe.com>
- Date: Wed, 14 Oct 2009 12:22:27 -0700
- To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, "Phillips, Addison" <addison@amazon.com>
- CC: Henri Sivonen <hsivonen@iki.fi>, Ian Hickson <ian@hixie.ch>, Geoffrey Sneddon <gsneddon@opera.com>, HTML WG <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
I think the latest editor's draft does a good job of describing the tables for default encoding as suggestions rather than normative requirements. I think this is appropriate; there is no normative requirement to support any charset other than UTF8 and ISO-8859-1/Win-1252, so normatively requiring a more complex auto-detection of charsets not supported doesn't make a lot of sense. The idea that you might reasonably guess the charset of retrieved HTML by looking at the locale of the browser doing the guessing, well, it is a very weak and not particularly accurate heuristic. And in situations where different browsers will have different configuration information, the "advantage" that multiple browsers behave similarly isn't very strong anyway. Larry -- http://larry.masinter.net -----Original Message----- From: public-html-request@w3.org [mailto:public-html-request@w3.org] On Behalf Of Leif Halvard Silli Sent: Wednesday, October 14, 2009 10:24 AM To: Phillips, Addison Cc: Henri Sivonen; Ian Hickson; Geoffrey Sneddon; HTML WG; public-i18n-core@w3.org Subject: Re: Locale/default encoding table Phillips, Addison On 09-10-14 16.18: >> I rather suspect that UTF-8 isn't the best default for any >> locale, since real UTF-8 content is unlikely to rely on the >> last defaulting step for decoding. I don't know why some >> Firefox localizations default to UTF-8. > > Why do you assume that UTF-8 pages are better labeled than > other encodings? Experience suggests otherwise :-). > > Although UTF-8 is positively detectable and several of us (Mark > Davis and I, at least) have suggested making UTF-8 > auto-detection a requirement, in fact, unless chardet is used, > nothing causes unannounced UTF-8 to work any better than any > other encoding. The effect of a UTF-8 auto-detection requirement would lead to two defaults: UTF-8 as one default. And legacy encodings as a secondary default. This sounds like an excellent idea. This would - I suppose - make it not needed to operate with UTF-8 as default for any locale for which there exist legacy encodings. This, in turn, would allow us to be more accurate in picking the default legacy ncoding(s). E.g. for Croat, it would not be necessary to have UTF-8 as default legacy fallback, I suppose. > The I18N WG pointed out that for many developing languages and > locales, the legacy encodings are fragmented and frequently > font-based, making UTF-8 a better default choice. This is not > the case for a relatively well-known language such as > Belarusian or Welsh, but it is the case for many minority and > developing world languages. Indeed. -- leif halvard silli
Received on Wednesday, 14 October 2009 19:23:27 UTC