- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 14 Oct 2009 22:50:09 +0200
- To: Larry Masinter <masinter@adobe.com>
- CC: "Phillips, Addison" <addison@amazon.com>, Henri Sivonen <hsivonen@iki.fi>, Ian Hickson <ian@hixie.ch>, Geoffrey Sneddon <gsneddon@opera.com>, HTML WG <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Larry Masinter On 09-10-14 21.22: > I think the latest editor's draft does a good job of > describing the tables for default encoding as > suggestions rather than normative requirements. Yes, it uses the word "suggest". > I think this is appropriate; there is no normative > requirement to support any charset other than UTF8 > and ISO-8859-1/Win-1252, so normatively requiring > a more complex auto-detection of charsets not > supported doesn't make a lot of sense. I thought the Addison and Mark's proposed required UTF-8 auto-detection would only check for UTF-8? > The idea that you might reasonably guess the > charset of retrieved HTML by looking at the locale > of the browser doing the guessing, well, it is > a very weak and not particularly accurate heuristic. In my mother tongue, the word for "default" very often seems to be "automatic". However, a "default" is what we - automatically - get when the automatics either are lacking or has been tried. Ian's algorithm just tells when in the detection process it's time to give up - to default. I think no one proposed that UAs should be required to do any guessing w.r.t. legacy encoding. Instead, we talked about which encoding default a browser for a particular locale should ship with and how accurate Ian's table is. A required UTF-8 auto-detection would however allow us to separate the concerns better when deciding for the default encoding. If it is reliable, then it could probably perhaps allow us to say - as Henri suggested - no locale (with legacy encodings) should use UTF-8 as the encoding default. > And in situations where different browsers will have > different configuration information, the "advantage" > that multiple browsers behave similarly isn't very > strong anyway. But if we have a table, it should be as correct as possible. And not simply "suggest" Win1251 for "all other locales" just like that. That is not simply a suggestion, but a postulate. > Larry Leif > -- > http://larry.masinter.net > > > -----Original Message----- > From: public-html-request@w3.org [mailto:public-html-request@w3.org] On Behalf Of Leif Halvard Silli > Sent: Wednesday, October 14, 2009 10:24 AM > To: Phillips, Addison > Cc: Henri Sivonen; Ian Hickson; Geoffrey Sneddon; HTML WG; public-i18n-core@w3.org > Subject: Re: Locale/default encoding table > > Phillips, Addison On 09-10-14 16.18: > >>> I rather suspect that UTF-8 isn't the best default for any >>> locale, since real UTF-8 content is unlikely to rely on the >>> last defaulting step for decoding. I don't know why some >>> Firefox localizations default to UTF-8. >> Why do you assume that UTF-8 pages are better labeled than >> other encodings? Experience suggests otherwise :-). >> >> Although UTF-8 is positively detectable and several of us (Mark >> Davis and I, at least) have suggested making UTF-8 >> auto-detection a requirement, in fact, unless chardet is used, >> nothing causes unannounced UTF-8 to work any better than any >> other encoding. > > The effect of a UTF-8 auto-detection requirement would lead to two > defaults: UTF-8 as one default. And legacy encodings as a > secondary default. > > This sounds like an excellent idea. > > This would - I suppose - make it not needed to operate with UTF-8 > as default for any locale for which there exist legacy encodings. > > This, in turn, would allow us to be more accurate in picking the > default legacy ncoding(s). E.g. for Croat, it would not be > necessary to have UTF-8 as default legacy fallback, I suppose. > >> The I18N WG pointed out that for many developing languages and >> locales, the legacy encodings are fragmented and frequently >> font-based, making UTF-8 a better default choice. This is not >> the case for a relatively well-known language such as >> Belarusian or Welsh, but it is the case for many minority and >> developing world languages. > > Indeed.
Received on Wednesday, 14 October 2009 20:50:48 UTC