Re: Wrong character encoding detect for XHTML from Karl Ove Hufthammer on 2001-12-07 (www-validator@w3.org from December 2001)

From: Karl Ove Hufthammer <huftis@bigfoot.com>
Date: Fri, 07 Dec 2001 14:49:01 +0100
To: www-validator@w3.org
Cc: mimasa@w3.org
Message-Id: <9uqksr.3vu4e01.1@ID-99504.news.dfncis.de>

2001-12-07 13:55:30, Masayasu Ishikawa <mimasa@w3.org>:

> No.  You used 'text/html' for your document,

It *also* happens when 'text/xml' is used:
<URL: http://home.no.net/huftis/kritikk/false-encoding.xml >

The validator here says:
'Detected Character Encoding: iso-8859-1'

It should (according to RFC 3023) say:
'Detected Character Encoding: US-ASCII'
and complain because the document contains octet sequences not
valid in the 'US-ASCII' encoding.

But even when 'text/html' is used, the document is not
a conformant (I won't use the word 'valid' here) XHTML 1.0
or 1.1 document. Both standards say:

        Such a declaration is required when the character
        encoding of the document is other than the default
        UTF-8 or UTF-16.

The user should (IMHO) be made aware of this on the validation
results page.

> Note that even for 'text/xml', UTF-8 is not the default.
> As defined in section 3.1 of RFC 3023 [4], the default
> charset value for the 'text/xml' media type is US-ASCII.

You're right. 'US-ASCII' is actually the default even if a
different encoding is specified in the XML declaration.

-- 
Karl Ove Hufthammer

Received on Friday, 7 December 2001 09:04:24 UTC