- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Fri, 29 May 2009 01:13:34 +0100
- To: Leif Halvard Silli <lhs@malform.no>
- CC: John Cowan <cowan@ccil.org>, "www-international@w3.org" <www-international@w3.org>
Leif Halvard Silli wrote: > John Cowan On 09-05-28 23.08: >> Leif Halvard Silli scripsit: >> >>> <meta name="Title" charset="Beagle Kennel van der Liniehoeve"> >> >> Well, this does say "charset" rather than "content". > > Yes, currently HTML doesn't have any @charset attribute. @charset is > only a new invention of the HTML 5 draft. (It's newly specified in HTML 5, but it's been supported by the major web browsers for practically forever.) > if I read the data correctly, then the HTML 5 draft algorithm that > Philip used, was unable to decode the correct charset info in the > _first_ meta element. I looked for the first charset in a <meta content>, and independently looked for the first <meta charset>, so that particular page was counted in both of those columns of the table. The "sniffer" column is the one that matched the algorithm in HTML 5, which stops after finding the first thing that looks like a charset specification, and for this page it reported windows-1252. > Measured against HTML 4, there seems to be _several_ errors in the > analysis/findings that is presented on that page. For instance, roughly > all the pages mentioned under the following fragment seems to have OK > charset info in their meta elements (and there are many other examples > of the same) - despite Philip's page saying there were errors: > > http://philip.html5.org/data/charsets.html#charset-en Most of those pages are sending HTTP headers like "Content-Type: text/html; charset=en" - the HTML has nothing to do with it. They're marked as 'invalid' because "en" is not a known character encoding. ('invalid' in that data just means the page's bytes couldn't be decoded with the specified encoding.) -- Philip Taylor pjt47@cam.ac.uk
Received on Friday, 29 May 2009 00:14:16 UTC