W3C home > Mailing lists > Public > public-html@w3.org > August 2009

Re: HTML5 Issue 11 (encoding detection): I18N WG response...

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 20 Aug 2009 10:14:45 +0300
Cc: Maciej Stachowiak <mjs@apple.com>, "public-html@w3.org" <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-Id: <1FFBE59A-A2E2-4D00-BD90-07C6D96A2C86@iki.fi>
To: "Phillips, Addison" <addison@amazon.com>
On Aug 20, 2009, at 10:06, Phillips, Addison wrote:

> I think the world has changed significantly. In the past, setting a  
> default of UTF-8 in your browser produced mainly bad results. But,  
> at least according to some measures [1], UTF-8 is rapidly becoming  
> the most reasonable default encoding on the Web.
> [1] http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html

This shows an uptake in UTF-8, but it proves nothing without data on  
how much is labeled and how much unlabeled. Uptake in labeled UTF-8 is  
awesome but doesn't affect what makes sense as the default processing  
for unlabeled data.

> At the same time, I think UTF-8 is more than a politically correct  
> fig leaf. The more standards and implementations stress good  
> choices, the more likely people (users, content authors) are to take  
> them seriously. If you happen to have chosen UTF-8 as an encoding,  
> your pages are more likely to just work. Recommending UTF-8 as a  
> default probably will continue to establish itself as the right  
> choice as time progresses. Remember: this is the "all else fails"  
> result and is exposed to user intervention by nearly all user agents.

HTML 5 already recommends (labeled) UTF-8 as the default for authoring  

Henri Sivonen
Received on Thursday, 20 August 2009 07:15:30 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:50 UTC