Re: XHTML vs. <meta>-only encoding declarations

* Terje Bless wrote:
>Right, and this is why I consider it a bug. IMO when served as text/html we
>should not pay attention to the XML Declaration; except in the case where it
>is the only source of encoding information (in which case it is a usefull
>heuristic but should generate a warning).

Or maybe an error if the document contains octet sequences that aren't
valid UTF-8 sequences...

>>XHTML user agents must ignore the meta element and HTML user agents
>>must ignore the XML declaration,
>
>Hmmm. I can't recall these two requirements from anywhere. Care to cite me a
>reference for them?

XHTML refers to XML 1.0 for encoding detection which clearly states that
the XML processor must assume UTF-8/16 in absence of an XML declaration;
XHTML user agents must thus behave as if they ignore the meta element.

HTML user agents are strictly speaking allowed to use the "processing
instruction that looks similar to an XML declaration" to determine the
encoding, but not with the same semantics, i.e., the meta element and
encoding information specified by the referring resource (charset
attribute on <a> for example) take precedence; that's rather close to
ignoring it.

>>.... Maybe I should bring this issue up to the HTML WG?
>
>I'm not sure what good it would do.

I suppose they either know or can agree on how conforming user agents
must or should behave and whether the document is to be considered
well-formed/valid/conforming/strictly-conforming/whatever...

>OTOH, if we can get unambiguous specs on this it would make my life soooo much
>easier, and would let us tell a much more compelling story to web developers.

We'll see...

Received on Friday, 4 July 2003 23:00:36 UTC