Re: Auto-detect and encodings in HTML5

Jungshik SHIN (신정식) wrote:
> There are some web sites with meta tags deeply buried ( > 512 bytes from the
> beginning). Webkit even has a layout test for this (currently, it scans the
> first 1024 bytes).
> 
> By no means, I'm happy with those web pages. So, I agree with you on this
> except that I'm not sure of requiring the meta cahrset declaration to be
> inside <head>.

Some possibly relevant data for this: 
http://philip.html5.org/data/encoding-detection.svg shows how many bytes 
have to be read before HTML5's <meta> charset sniffing algorithm finds 
an answer, based on 130K pages downloaded from dmoz.org (with a heavy 
American/European bias). (http://philip.html5.org/data/charsets.html has 
other charset data.)

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Thursday, 28 May 2009 07:11:16 UTC