Re: HTML5 Issue 11 (encoding detection): I18N WG response...

On Thu, Aug 20, 2009 at 00:27, Henri Sivonen <hsivonen@iki.fi> wrote:

> On Aug 20, 2009, at 10:22, Phillips, Addison wrote:
>
> ...

>  Ah.... but this data, I'm told, is based on the encoding *after detection*
>> by Google's crawler, not on the declaration.
>>
>
>
> But does it also exclude pages that have encoding labels? Data about the
> frequency of users hitting unlabeled pages in particular encodings is the
> interesting this here.


At Google, the encoding label is taken only as a weak signal (a small factor
in the heuristic detection). It is completely overwhelmed by the byte
content analysis. (There are too many unlabeled pages *and mislabeled
pages*for the label to be used as is.)

Mark

Received on Thursday, 20 August 2009 14:31:05 UTC