W3C home > Mailing lists > Public > www-validator@w3.org > December 2002

Re: flakey charset detection

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 4 Dec 2002 21:46:07 +0000 (GMT)
To: Karl Dubost <karl@w3.org>
Cc: David Brownell <david-b@pacbell.net>, "www-validator@w3.org" <www-validator@w3.org>
Message-ID: <Pine.LNX.4.21.0212042132420.7980-100000@dhalsim.dreamhost.com>

On Wed, 4 Dec 2002, Karl Dubost wrote:
> I put an XHTML 1.0 document encoded as UTF-8
> http://www.w3.org/QA/2002/12/xhtml-utf-8.html
> without Meta or XML Declaration, because XHTML 1.0 is an XML 
> document, so XML document encoded as UTF-8 doesn't need the encoding 
> information.

Well, text/html doesn't default to UTF-8, and the HTML WG has in fact said
that text/html should always be handled as HTML and not XHTML (yes, the
validator is not complying to that).

> The only problem I see is that the validator does the right job and
> respect the HTTP header information
>    HEAD http://www.w3.org/QA/2002/12/xhtml-utf-8.html
>    200 OK
>    Date: Wed, 04 Dec 2002 16:55:58 GMT
>    Content-Type: text/html; charset=iso-8859-1
> BUT It validates with the wrong encoding. So the validator doesn't 
> check if the document is sent with the right encoding. But I guess in 
> some cases it's a bit tricky to detect.

I don't fully understand what you mean by this, but the HTTP header
overrides the rules in the XML spec so that document is encoded as
ISO-8859-1 and not UTF-8.

See XML section F.2 and RFC 2854 sections 2 and 6.

Ian Hickson                                      )\._.,--....,'``.    fL
"meow"                                          /,   _.. \   _\  ;`._ ,.
http://index.hixie.ch/                         `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 4 December 2002 16:46:10 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:58:31 UTC