- From: Phillips, Addison <addison@amazon.com>
- Date: Wed, 14 Oct 2009 10:11:24 -0400
- To: Ian Hickson <ian@hixie.ch>, Andrew Cunningham <andrewc@vicnet.net.au>, Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- CC: Geoffrey Sneddon <gsneddon@opera.com>, HTML WG <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, Henri Sivonen <hsivonen@iki.fi>, Maciej Stachowiak <mjs@apple.com>, Mark Davis ☕ <mark@macchiato.com>, Martin_J=2E_D=FCrst <duerst@it.aoyama.ac.jp>, Richard Ishida <ishida@w3.org>, Larry Masinter <masinter@adobe.com>
> Which text? > > If you mean the text proposed here: > > http://lists.w3.org/Archives/Public/public- > html/2009Aug/1040.html > > ...then I discussed it here: > > http://lists.w3.org/Archives/Public/public- > html/2009Oct/0281.html I had not seen your response yet and have not yet digested it fully. I take it you don't like our text? :-) > > > > My concern with providing a table is that it preserves, > essentially > > forever, the behavior of browsers in the past. > > The behaviour will be preserved whether the spec admits it or not. > I see > no reason to sweep it under the carpet just because we wish the > world was different. I think that you are misreading my comment. I *do* think that browsers should preserve their existing behavior from the point of view that they should have a localizable default encoding and also offer the user the ability to override that default. What I'm objecting to is preserving the *particular* localized choices that exist right now today by fiat and effectively "forever". > > > > Character encoding distribution is and historically has been > evolving. > > As recently as eight years ago, most browsers did not support > proper > > display of UTF-8. Today, the most common encoding on the Web *is* > UTF-8. > > The localization choices of current vendors--whether well- or > > ill-conceived--should not necessarily be *normative* guidance > embedded > > in the HTML5 spec for future generations of browser vendors. > > This issue is not about what the most common encoding might be. > This issue is about what the most common encoding *in unlabeled content* is. Yeah, so? My point is that encoding choices have evolved significantly over very short periods of time. And this is a trend I expect to continue for some time. The most common unlabeled encoding tends to follow the most common labeled encodings for a given audience because that it is how user's browsers are set up. Otherwise, at least for non-ASCII pages, the page displays as mojibake and the author makes some effort to fix it as a result. > > > > I think that having a table like this is useful information. But > it > > should be "backwards pointing" and separate from HTML. I'd point > out: > > the I18N WG hosts any number of pages documenting information > such as > > this about browsers. I think we'd be very happy to add this to > the > > collection. It could even be referenced from HTML5. Just don't > make it > > part of the spec... because I know many developers who follow > exactly > > what the spec says. And this is *not* appropriate in this case > because > > the encoding environment is still evolving and because many > locales have > > been disadvantaged in the past. > > If by developers you mean authors, the spec is very clear that UTF- > 8 is the only recommendation. By developers I mean software engineers writing browsers: if you give them a normative table for how to select the localized default encoding, they will feel they have no choice but to implement it for a given locale. Addison
Received on Wednesday, 14 October 2009 14:12:01 UTC