W3C home > Mailing lists > Public > www-international@w3.org > April to June 2009

Re: Auto-detect and encodings in HTML5

From: Leif Halvard Silli <lhs@malform.no>
Date: Fri, 29 May 2009 05:15:03 +0200
Message-ID: <4A1F5337.60705@malform.no>
To: Philip Taylor <pjt47@cam.ac.uk>
CC: John Cowan <cowan@ccil.org>, "www-international@w3.org" <www-international@w3.org>
Philip Taylor On 09-05-29 02.13:
> Leif Halvard Silli wrote:
>> John Cowan On 09-05-28 23.08:
>>> Leif Halvard Silli scripsit:
>>>
>>>> <meta name="Title" charset="Beagle Kennel van der Liniehoeve">
>>>
>>> Well, this does say "charset" rather than "content".
>>
>> Yes, currently HTML doesn't have any @charset attribute. @charset is 
>> only a new invention of the HTML 5 draft.
> 
> (It's newly specified in HTML 5, but it's been supported by the major 
> web browsers for practically forever.)

Interesting how few pages that used it, though. I really don't 
know if speccing it makes anything any clearer for anyone.

>> if I read the data correctly, then the HTML 5 draft algorithm that 
>> Philip used, was unable to decode the correct charset info in the 
>> _first_ meta element.
> 
> I looked for the first charset in a <meta content>, and independently 
> looked for the first <meta charset>, so that particular page was counted 
> in both of those columns of the table. The "sniffer" column is the one 
> that matched the algorithm in HTML 5, which stops after finding the 
> first thing that looks like a charset specification, and for this page 
> it reported windows-1252.

... may be I just don't understand the presentation: the caption 
of the table says: "Number of pages declaring encoding (% decoded 
without errors)" About the "beagle" page in particular, the 
different columns say:

HTTP: U0; meta content: 0; Sniffer: 0; meta charset: 1 (0%);

?
-- 
leif halvard silli
Received on Friday, 29 May 2009 03:15:41 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:19 GMT