Re: Auto-detect and encodings in HTML5

John Cowan On 09-05-28 23.08:
> Leif Halvard Silli scripsit:
>
>   
>> <meta name="Title" charset="Beagle Kennel van der Liniehoeve">
>>     
>
> Well, this does say "charset" rather than "content".
>   

Yes, currently HTML doesn't have any @charset attribute. @charset is 
only a new invention of the HTML 5 draft. (May be this page tries to 
document how usually correct - charset wise - the _current_ use of this 
illegal attribute is?)

What I meant to point out, though, was that since the debate was about 
how deep into the page one should sniff, then this page had a correct 
"charset tag" as the first element of the <head> element. It can't 
become any better than that, can it? That it also has this - in every 
sense [except in the HTML 5 sense] - meaningless  meta element further 
down in the <head> does not matter to the issue that was debated, I think.

But if I read the data correctly, then the HTML 5 draft algorithm that 
Philip used, was unable to decode the correct charset info in the 
_first_ meta element. I wonder why.

Measured against HTML 4, there seems to be _several_ errors in the 
analysis/findings that is presented on that page. For instance, roughly 
all the pages mentioned under the following fragment seems to have OK 
charset info in their meta elements (and there are many other examples 
of the same) - despite Philip's page saying there were errors:

http://philip.html5.org/data/charsets.html#charset-en

I don't know if this represents errors in the HTML 5 algorithm [back 
then], or if Philip just weren't critical enough towards the errors he 
believed that his analysis tool had found. (There are some that see any 
error in current deployed HTML as a justification for HTML 5.)

But may be I just don't understand what the page tries to tell.
-- 
leif halvard silli

Received on Thursday, 28 May 2009 23:54:06 UTC