W3C home > Mailing lists > Public > public-html@w3.org > August 2009

Re: HTML5 Issue 11 (encoding detection): I18N WG response...

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 20 Aug 2009 10:27:54 +0300
Cc: Maciej Stachowiak <mjs@apple.com>, "public-html@w3.org" <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-Id: <7874BFF9-C7EA-4FE2-814F-A2AEDC38AE94@iki.fi>
To: "Phillips, Addison" <addison@amazon.com>
On Aug 20, 2009, at 10:22, Phillips, Addison wrote:

>>> I think the world has changed significantly. In the past, setting
>> a
>>> default of UTF-8 in your browser produced mainly bad results. But,
>>> at least according to some measures [1], UTF-8 is rapidly
>> becoming
>>> the most reasonable default encoding on the Web.
>> [...]
>>> [1] http://googleblog.blogspot.com/2008/05/moving-to-unicode-
>> 51.html
>> This shows an uptake in UTF-8, but it proves nothing without data
>> on
>> how much is labeled and how much unlabeled. Uptake in labeled UTF-8
>> is
>> awesome but doesn't affect what makes sense as the default
>> processing
>> for unlabeled data.
> Ah.... but this data, I'm told, is based on the encoding *after  
> detection* by Google's crawler, not on the declaration.

But does it also exclude pages that have encoding labels? Data about  
the frequency of users hitting unlabeled pages in particular encodings  
is the interesting this here.

Henri Sivonen
Received on Thursday, 20 August 2009 07:28:39 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:50 UTC