Re: flakey charset detection

On Wed, 4 Dec 2002, Karl Dubost wrote:

> Could you give an URI of your document?

Quite.

> >p.s. Given that it's XHTML, I find the fact that it even _tried_
> >      using the META element to be worrisome ... that means that
> >      parsing this document as XML could give different results,
> >      which breaks all XHTML goals I ever heard.  Not that I've
> >      tracked XHTML recently, but this seems like trouble.

Was that XHTML served as HTML or XML?

> I put an XHTML 1.0 document encoded as UTF-8
> http://www.w3.org/QA/2002/12/xhtml-utf-8.html
> 
> without Meta or XML Declaration,

> HEAD http://www.w3.org/QA/2002/12/xhtml-utf-8.html
> 200 OK
> Date: Wed, 04 Dec 2002 16:55:58 GMT
> Content-Type: text/html; charset=iso-8859-1

By serving it as text/html, you're telling us HTML rules apply, including
the charset you sent with it.  More specifically, Appendix-wossname rules
for XHTML, which (as Hixie has demonstrated) leads to unavoidable
contradictions.


> BUT It validates with the wrong encoding.

Erm, it validates correctly as iso-8859-1 under the above rules.
Your example is valid - even if it doesn't mean what it should -
as latin1!

> An XML declaration is not required in all XML documents; however 
> XHTML document authors are strongly encouraged to use XML 
> declarations in all their documents. Such a declaration is required 
> when the character encoding of the document is other than the default 
> UTF-8 or UTF-16 and no encoding was determined by a higher-level 
> protocol.

In this instance, an encoding was determined by HTTP, which
presumably counts as a "higher-level protocol" in the above.

-- 
Nick Kew

Received on Wednesday, 4 December 2002 14:32:56 UTC