W3C home > Mailing lists > Public > www-international@w3.org > April to June 2009

Re: Auto-detect and encodings in HTML5

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Thu, 28 May 2009 08:10:38 +0100
Message-ID: <4A1E38EE.8080807@cam.ac.uk>
To: "Jungshik SHIN (신정식)" <jshin1987+w3@gmail.com>
CC: Erik van der Poel <erikv@google.com>, Travis Leithead <Travis.Leithead@microsoft.com>, "public-html@w3.org" <public-html@w3.org>, "www-international@w3.org" <www-international@w3.org>, Richard Ishida <ishida@w3.org>, Ian Hickson <ian@hixie.ch>, Chris Wilson <Chris.Wilson@microsoft.com>, Harley Rosnow <Harley.Rosnow@microsoft.com>, Simon Montagu <smontagu@smontagu.org>, ap@webkit.org
Jungshik SHIN (신정식) wrote:
> There are some web sites with meta tags deeply buried ( > 512 bytes from the
> beginning). Webkit even has a layout test for this (currently, it scans the
> first 1024 bytes).
> 
> By no means, I'm happy with those web pages. So, I agree with you on this
> except that I'm not sure of requiring the meta cahrset declaration to be
> inside <head>.

Some possibly relevant data for this: 
http://philip.html5.org/data/encoding-detection.svg shows how many bytes 
have to be read before HTML5's <meta> charset sniffing algorithm finds 
an answer, based on 130K pages downloaded from dmoz.org (with a heavy 
American/European bias). (http://philip.html5.org/data/charsets.html has 
other charset data.)

-- 
Philip Taylor
pjt47@cam.ac.uk
Received on Thursday, 28 May 2009 07:11:14 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:19 GMT