W3C home > Mailing lists > Public > public-html@w3.org > August 2009

Re: HTML5 Issue 11 (encoding detection): I18N WG response...

From: Mark Davis ⌛ <mark@macchiato.com>
Date: Thu, 20 Aug 2009 07:30:23 -0700
Message-ID: <30b660a20908200730v7ce095cfv5291404bef1501f6@mail.gmail.com>
To: Henri Sivonen <hsivonen@iki.fi>
Cc: "Phillips, Addison" <addison@amazon.com>, Maciej Stachowiak <mjs@apple.com>, "public-html@w3.org" <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
On Thu, Aug 20, 2009 at 00:27, Henri Sivonen <hsivonen@iki.fi> wrote:

> On Aug 20, 2009, at 10:22, Phillips, Addison wrote:
>
> ...

>  Ah.... but this data, I'm told, is based on the encoding *after detection*
>> by Google's crawler, not on the declaration.
>>
>
>
> But does it also exclude pages that have encoding labels? Data about the
> frequency of users hitting unlabeled pages in particular encodings is the
> interesting this here.


At Google, the encoding label is taken only as a weak signal (a small factor
in the heuristic detection). It is completely overwhelmed by the byte
content analysis. (There are too many unlabeled pages *and mislabeled
pages*for the label to be used as is.)

Mark
Received on Thursday, 20 August 2009 21:09:39 UTC

This archive was generated by hypermail 2.3.1 : Friday, 10 October 2014 16:24:51 UTC