Re: HTML5 Issue 11 (encoding detection): I18N WG response... from Henri Sivonen on 2009-08-20 (public-html@w3.org from August 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 20 Aug 2009 10:27:54 +0300
To: "Phillips, Addison" <addison@amazon.com>
Cc: Maciej Stachowiak <mjs@apple.com>, "public-html@w3.org" <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-Id: <7874BFF9-C7EA-4FE2-814F-A2AEDC38AE94@iki.fi>

On Aug 20, 2009, at 10:22, Phillips, Addison wrote:

>>> I think the world has changed significantly. In the past, setting
>> a
>>> default of UTF-8 in your browser produced mainly bad results. But,
>>> at least according to some measures [1], UTF-8 is rapidly
>> becoming
>>> the most reasonable default encoding on the Web.
>> [...]
>>> [1] http://googleblog.blogspot.com/2008/05/moving-to-unicode-
>> 51.html
>>
>> This shows an uptake in UTF-8, but it proves nothing without data
>> on
>> how much is labeled and how much unlabeled. Uptake in labeled UTF-8
>> is
>> awesome but doesn't affect what makes sense as the default
>> processing
>> for unlabeled data.
>
> Ah.... but this data, I'm told, is based on the encoding *after  
> detection* by Google's crawler, not on the declaration.


But does it also exclude pages that have encoding labels? Data about  
the frequency of users hitting unlabeled pages in particular encodings  
is the interesting this here.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Thursday, 20 August 2009 07:28:39 UTC