W3C home > Mailing lists > Public > public-html@w3.org > October 2009

RE: Locale/default encoding table

From: Phillips, Addison <addison@amazon.com>
Date: Wed, 14 Oct 2009 10:11:24 -0400
To: Ian Hickson <ian@hixie.ch>, Andrew Cunningham <andrewc@vicnet.net.au>, Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
CC: Geoffrey Sneddon <gsneddon@opera.com>, HTML WG <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, Henri Sivonen <hsivonen@iki.fi>, Maciej Stachowiak <mjs@apple.com>, Mark Davis ☕ <mark@macchiato.com>, Martin_J=2E_D=FCrst <duerst@it.aoyama.ac.jp>, Richard Ishida <ishida@w3.org>, Larry Masinter <masinter@adobe.com>
Message-ID: <C7A5719F1E562149BA9171F58BEE2CA41298281913@EX-IAD6-B.ant.amazon.com>
> Which text?
> 
> If you mean the text proposed here:
> 
>    http://lists.w3.org/Archives/Public/public-

> html/2009Aug/1040.html
> 
> ...then I discussed it here:
> 
>    http://lists.w3.org/Archives/Public/public-

> html/2009Oct/0281.html

I had not seen your response yet and have not yet digested it fully. I take it you don't like our text? :-)

> 
> 
> > My concern with providing a table is that it preserves,
> essentially
> > forever, the behavior of browsers in the past.
> 
> The behaviour will be preserved whether the spec admits it or not.
> I see
> no reason to sweep it under the carpet just because we wish the
> world was different.

I think that you are misreading my comment. I *do* think that browsers should preserve their existing behavior from the point of view that they should have a localizable default encoding and also offer the user the ability to override that default.

What I'm objecting to is preserving the *particular* localized choices that exist right now today by fiat and effectively "forever".


> 
> 
> > Character encoding distribution is and historically has been
> evolving.
> > As recently as eight years ago, most browsers did not support
> proper
> > display of UTF-8. Today, the most common encoding on the Web *is*
> UTF-8.
> > The localization choices of current vendors--whether well- or
> > ill-conceived--should not necessarily be *normative* guidance
> embedded
> > in the HTML5 spec for future generations of browser vendors.
> 
> This issue is not about what the most common encoding might be.
> This issue is about what the most common encoding *in unlabeled content* is.

Yeah, so? My point is that encoding choices have evolved significantly over very short periods of time. And this is a trend I expect to continue for some time. The most common unlabeled encoding tends to follow the most common labeled encodings for a given audience because that it is how user's browsers are set up. Otherwise, at least for non-ASCII pages, the page displays as mojibake and the author makes some effort to fix it as a result.

> 
> 
> > I think that having a table like this is useful information. But
> it
> > should be "backwards pointing" and separate from HTML. I'd point
> out:
> > the I18N WG hosts any number of pages documenting information
> such as
> > this about browsers. I think we'd be very happy to add this to
> the
> > collection. It could even be referenced from HTML5. Just don't
> make it
> > part of the spec... because I know many developers who follow
> exactly
> > what the spec says. And this is *not* appropriate in this case
> because
> > the encoding environment is still evolving and because many
> locales have
> > been disadvantaged in the past.
> 
> If by developers you mean authors, the spec is very clear that UTF-
> 8 is the only recommendation. 

By developers I mean software engineers writing browsers: if you give them a normative table for how to select the localized default encoding, they will feel they have no choice but to implement it for a given locale.

Addison
Received on Wednesday, 14 October 2009 14:12:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:50 GMT