Re: Wrong character encoding detect for XHTML

2001-12-07 13:55:30, Masayasu Ishikawa <mimasa@w3.org>:

> No.  You used 'text/html' for your document,

It *also* happens when 'text/xml' is used:
<URL: http://home.no.net/huftis/kritikk/false-encoding.xml >

The validator here says:
'Detected Character Encoding: iso-8859-1'

It should (according to RFC 3023) say:
'Detected Character Encoding: US-ASCII'
and complain because the document contains octet sequences not
valid in the 'US-ASCII' encoding.

But even when 'text/html' is used, the document is not
a conformant (I won't use the word 'valid' here) XHTML 1.0
or 1.1 document. Both standards say:

        Such a declaration is required when the character
        encoding of the document is other than the default
        UTF-8 or UTF-16.

The user should (IMHO) be made aware of this on the validation
results page.

> Note that even for 'text/xml', UTF-8 is not the default.
> As defined in section 3.1 of RFC 3023 [4], the default
> charset value for the 'text/xml' media type is US-ASCII.

You're right. 'US-ASCII' is actually the default even if a
different encoding is specified in the XML declaration.

-- 
Karl Ove Hufthammer

Received on Friday, 7 December 2001 09:04:24 UTC