Re: Charset usage data from Philip Taylor on 2008-03-07 (public-html@w3.org from March 2008)

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Fri, 07 Mar 2008 15:33:19 +0000
To: HTML WG <public-html@w3.org>
Message-ID: <47D1603F.1060401@cam.ac.uk>

Philip Taylor wrote:
> http://philip.html5.org/data/charsets.html
> 
> [...] 
> 
> The encoding sniffing algorithm works significantly better with 1024 
> bytes (finds 92% of charsets) that with 512 (finds 82%).

I've now updated that page to use a version of the encoding sniffing 
that should match the current spec, and added 
http://philip.html5.org/data/encoding-detection.svg to show how the 
effectiveness varies with the number of bytes of content used in the 
algorithm.

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Friday, 7 March 2008 15:33:31 UTC