W3C home > Mailing lists > Public > public-html@w3.org > August 2009

RE: HTML5 Issue 11 (encoding detection): I18N WG response...

From: Phillips, Addison <addison@amazon.com>
Date: Thu, 20 Aug 2009 00:22:11 -0700
To: Henri Sivonen <hsivonen@iki.fi>
CC: Maciej Stachowiak <mjs@apple.com>, "public-html@w3.org" <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <4D25F22093241741BC1D0EEBC2DBB1DA01ACCE9B99@EX-SEA5-D.ant.amazon.com>
> 
> > I think the world has changed significantly. In the past, setting
> a
> > default of UTF-8 in your browser produced mainly bad results. But,
> > at least according to some measures [1], UTF-8 is rapidly
> becoming
> > the most reasonable default encoding on the Web.
> [...]
> > [1] http://googleblog.blogspot.com/2008/05/moving-to-unicode-

> 51.html
> 
> This shows an uptake in UTF-8, but it proves nothing without data
> on
> how much is labeled and how much unlabeled. Uptake in labeled UTF-8
> is
> awesome but doesn't affect what makes sense as the default
> processing
> for unlabeled data.

Ah.... but this data, I'm told, is based on the encoding *after detection* by Google's crawler, not on the declaration.

> 
> > At the same time, I think UTF-8 is more than a politically
> correct
> > fig leaf. The more standards and implementations stress good
> > choices, the more likely people (users, content authors) are to
> take
> > them seriously. If you happen to have chosen UTF-8 as an encoding,
> > your pages are more likely to just work. Recommending UTF-8 as a
> > default probably will continue to establish itself as the right
> > choice as time progresses. Remember: this is the "all else fails"
> > result and is exposed to user intervention by nearly all user
> agents.
> 
> HTML 5 already recommends (labeled) UTF-8 as the default for
> authoring tools.

Yes, which is good, and will promote further growth of UTF-8 as a reasonable default expectation.

But I admit: I'm shouting at the forest fire here. I don't expect UTF-8 to be used as a default by any user agent, at least not imminently. What's key is the rest of the wording. 

What I expect is that, given a few more years (HTML4 has been with us for ten years), the UTF-8 recommendation could take over the world :-).

Addison

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.





Received on Thursday, 20 August 2009 07:22:50 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:44 GMT