Re: flakey charset detection

At 8:01 -0800 2002-12-04, David Brownell wrote:
>I recently validated a xhtml 1.0 page that used to validate just fine, and
>instead, I got a message that said things like:

Could you give an URI of your document?


>p.s. Given that it's XHTML, I find the fact that it even _tried_
>      using the META element to be worrisome ... that means that
>      parsing this document as XML could give different results,
>      which breaks all XHTML goals I ever heard.  Not that I've
>      tracked XHTML recently, but this seems like trouble.

I put an XHTML 1.0 document encoded as UTF-8
http://www.w3.org/QA/2002/12/xhtml-utf-8.html

without Meta or XML Declaration, because XHTML 1.0 is an XML 
document, so XML document encoded as UTF-8 doesn't need the encoding 
information.

And it valids perfectly
http://validator.w3.org/check?uri=http://www.w3.org/QA/2002/12/xhtml-utf-8.html

The only problem I see is
	that the validator does the right job and respect the HTTP 
header information

HEAD http://www.w3.org/QA/2002/12/xhtml-utf-8.html
200 OK
Date: Wed, 04 Dec 2002 16:55:58 GMT
Content-Type: text/html; charset=iso-8859-1


BUT It validates with the wrong encoding. So the validator doesn't 
check if the document is sent with the right encoding. But I guess in 
some cases it's a bit tricky to detect.


I have the feeling, but I may be wrong that the validator should not 
validate it :) but even that it's not sure. :)

in http://www.w3.org/TR/xhtml1/#docconf
An XML declaration is not required in all XML documents; however 
XHTML document authors are strongly encouraged to use XML 
declarations in all their documents. Such a declaration is required 
when the character encoding of the document is other than the default 
UTF-8 or UTF-16 and no encoding was determined by a higher-level 
protocol.


-- 
Karl Dubost / W3C - Conformance Manager
           http://www.w3.org/QA/

      --- Be Strict To Be Cool! ---

Received on Wednesday, 4 December 2002 12:34:03 UTC