Re: XHTML vs. <meta>-only encoding declarations

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bjoern Hoehrmann <derhoermi@gmx.net> wrote:

>>>>Granted XML rules suggest UTF-8 or -16 for this case, but as it's
>>>>served as text/html -- e.g. Appendix C rules -- these do not, IMO,
>>>>apply.
>>>
>>>But the MarkUp Validator honors the XML declaration in such
>>>documents...
>>
>>So it does, but I'm inclined to consider this a bug where text/html
>>documents are concerned. And note that it only considers an explicitly
>>given encoding from the XML Declaration and does not apply XML
>>defaulting rules here.
>
>That's inconsistent as it is inconsistent to honor both, the meta
>element and the XML declaration, they are mutually exclusive.

Right, and this is why I consider it a bug. IMO when served as text/html we
should not pay attention to the XML Declaration; except in the case where it
is the only source of encoding information (in which case it is a usefull
heuristic but should generate a warning).

Unfortunately, in 0.6.x, we don't have the requisite smarts about
Content-Types and what they mean to be able to handle this distinction. In 0.7
we do so it will likely be smarter about this (at least potentially as I still
need to figure out what behaviour constitutes "smart" for all possible
Content-Types ;D).


>XHTML user agents must ignore the meta element and HTML user agents
>must ignore the XML declaration,

Hmmm. I can't recall these two requirements from anywhere. Care to cite me a
reference for them?


>.... Maybe I should bring this issue up to the HTML WG?

I'm not sure what good it would do. The underlying problem is that, due to
Appendix C and other bits of XHTML 1.0 Rec, the text/html MIME Content-Type is
used in an ambiguous way to indicate incompatible object classes. In fact,
SGML vs. XML is impossible to resolve reliably (you need to guess; perhaps
guess very reliably, but still a guess) without out-of-band hinting in the
Content-Type (cf. Hixie's comment example).

Exactly what steps need be taken depend on whether your two requirements above
are supported by the current language in the relevant Recommendations, but in
either case there is quite a bit of cleanup that needs to be done; and some of
it in places (e.g. the HTML 4.01 Rec) where I think it would be hard to effect
change at this juncture.

Mainly the problem is that of whether text/html is SGML or XML, and Appendix C
of XHTML 1.0 forces us to treat it as "a bit of both, really".


OTOH, if we can get unambiguous specs on this it would make my life soooo much
easier, and would let us tell a much more compelling story to web developers.

- -- 
If you believe that will stop spammers, you're sadly misled. Rusty hooks,
rectally administered fuel oil enemas, and the gutting of their machines,
*that* stops spammers!                                         -- Saundo

-----BEGIN PGP SIGNATURE-----
Version: PGP SDK 3.0.2

iQA/AwUBPwYVMaPyPrIkdfXsEQIyFwCdFvoQKE8RCyGXy00h0VWkMyS5N/oAniT1
arzM8mx5/OcO13JqkVffkGAq
=GVUr
-----END PGP SIGNATURE-----

Received on Friday, 4 July 2003 20:00:53 UTC