W3C home > Mailing lists > Public > www-validator@w3.org > December 2002

Re: flakey charset detection

From: Nick Kew <nick@webthing.com>
Date: Wed, 4 Dec 2002 19:32:52 +0000 (GMT)
To: Karl Dubost <karl@w3.org>
cc: David Brownell <david-b@pacbell.net>, www-validator@w3.org
Message-ID: <Pine.LNX.4.21.0212041913170.1100-100000@jarl.webthing.com>

On Wed, 4 Dec 2002, Karl Dubost wrote:

> Could you give an URI of your document?

Quite.

> >p.s. Given that it's XHTML, I find the fact that it even _tried_
> >      using the META element to be worrisome ... that means that
> >      parsing this document as XML could give different results,
> >      which breaks all XHTML goals I ever heard.  Not that I've
> >      tracked XHTML recently, but this seems like trouble.

Was that XHTML served as HTML or XML?

> I put an XHTML 1.0 document encoded as UTF-8
> http://www.w3.org/QA/2002/12/xhtml-utf-8.html
> 
> without Meta or XML Declaration,

> HEAD http://www.w3.org/QA/2002/12/xhtml-utf-8.html
> 200 OK
> Date: Wed, 04 Dec 2002 16:55:58 GMT
> Content-Type: text/html; charset=iso-8859-1

By serving it as text/html, you're telling us HTML rules apply, including
the charset you sent with it.  More specifically, Appendix-wossname rules
for XHTML, which (as Hixie has demonstrated) leads to unavoidable
contradictions.


> BUT It validates with the wrong encoding.

Erm, it validates correctly as iso-8859-1 under the above rules.
Your example is valid - even if it doesn't mean what it should -
as latin1!

> An XML declaration is not required in all XML documents; however 
> XHTML document authors are strongly encouraged to use XML 
> declarations in all their documents. Such a declaration is required 
> when the character encoding of the document is other than the default 
> UTF-8 or UTF-16 and no encoding was determined by a higher-level 
> protocol.

In this instance, an encoding was determined by HTTP, which
presumably counts as a "higher-level protocol" in the above.

-- 
Nick Kew
Received on Wednesday, 4 December 2002 14:32:56 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:05 GMT