- From: Maciej Stachowiak <mjs@apple.com>
- Date: Thu, 20 Aug 2009 00:25:14 -0700
- To: Henri Sivonen <hsivonen@iki.fi>
- Cc: "Phillips, Addison" <addison@amazon.com>, "public-html@w3.org" <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
On Aug 20, 2009, at 12:14 AM, Henri Sivonen wrote: > On Aug 20, 2009, at 10:06, Phillips, Addison wrote: > >> I think the world has changed significantly. In the past, setting a >> default of UTF-8 in your browser produced mainly bad results. But, >> at least according to some measures [1], UTF-8 is rapidly becoming >> the most reasonable default encoding on the Web. > [...] >> [1] http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html > > This shows an uptake in UTF-8, but it proves nothing without data on > how much is labeled and how much unlabeled. Uptake in labeled UTF-8 > is awesome but doesn't affect what makes sense as the default > processing for unlabeled data. This is the key point. The relevant statistic for the default encoding is the predominant encoding for unlabeled content. We could do a new study on this, but to the best of my knowledge, UTF-8 is rare. Also, it's been mentioned that UTF-8 can be heuristically detected without too much effort. If that's the case, then it does not make much sense to make it the fallback after algorithmic detection has failed. That being said, I agree that the uptake of UTF-8 is awesome, and I think everyone would like to see public Web content move to UTF-8 as much as possible. The only question is how to do this in light of legacy constraints. - Maciej > >> At the same time, I think UTF-8 is more than a politically correct >> fig leaf. The more standards and implementations stress good >> choices, the more likely people (users, content authors) are to >> take them seriously. If you happen to have chosen UTF-8 as an >> encoding, your pages are more likely to just work. Recommending >> UTF-8 as a default probably will continue to establish itself as >> the right choice as time progresses. Remember: this is the "all >> else fails" result and is exposed to user intervention by nearly >> all user agents. > > HTML 5 already recommends (labeled) UTF-8 as the default for > authoring tools. > > -- > Henri Sivonen > hsivonen@iki.fi > http://hsivonen.iki.fi/ > > >
Received on Thursday, 20 August 2009 07:25:57 UTC