Re: flakey charset detection

On Wed, 4 Dec 2002, Karl Dubost wrote:
> 
> I put an XHTML 1.0 document encoded as UTF-8
> http://www.w3.org/QA/2002/12/xhtml-utf-8.html
> 
> without Meta or XML Declaration, because XHTML 1.0 is an XML 
> document, so XML document encoded as UTF-8 doesn't need the encoding 
> information.

Well, text/html doesn't default to UTF-8, and the HTML WG has in fact said
that text/html should always be handled as HTML and not XHTML (yes, the
validator is not complying to that).


> The only problem I see is that the validator does the right job and
> respect the HTTP header information
> 
>    HEAD http://www.w3.org/QA/2002/12/xhtml-utf-8.html
>    200 OK
>    Date: Wed, 04 Dec 2002 16:55:58 GMT
>    Content-Type: text/html; charset=iso-8859-1
> 
> BUT It validates with the wrong encoding. So the validator doesn't 
> check if the document is sent with the right encoding. But I guess in 
> some cases it's a bit tricky to detect.

I don't fully understand what you mean by this, but the HTTP header
overrides the rules in the XML spec so that document is encoded as
ISO-8859-1 and not UTF-8.

See XML section F.2 and RFC 2854 sections 2 and 6.

-- 
Ian Hickson                                      )\._.,--....,'``.    fL
"meow"                                          /,   _.. \   _\  ;`._ ,.
http://index.hixie.ch/                         `._.-(,_..'--(,_..'`-.;.'

Received on Wednesday, 4 December 2002 16:46:10 UTC