- From: Masayasu Ishikawa <mimasa@w3.org>
- Date: Fri, 07 Dec 2001 21:55:30 +0900 (JST)
- To: huftis@bigfoot.com
- Cc: www-validator@w3.org
Karl Ove Hufthammer <huftis@bigfoot.com> wrote: > Testcase: > <URL: http://home.no.net/huftis/kritikk/false-encoding.html > > > This document is an XHTML 1.1 document with no XML declaration. > No 'charset' parameter is sent by HTTP, therefore, the document > uses the character encoding 'UTF-8' (the default for all > X(HT)ML documents). No. You used 'text/html' for your document, then RFC 2854 [1] applies. The 'text/html' media type registration itself doesn't define the default value for the charset parameter, and as noted in "6. Charset default rules" of RFC 2854, RFC 2616 [2] section 3.7.1 defines that "media subtypes of the 'text' type are defined to have a default charset value of 'ISO-8859-1'" (for good or bad). Section 5.2.2 of the HTML 4 spec [3] further says that "[i]n practice, this recommendation has proved useless ... Therefore, user agents must not assume any default value for the "charset" parameter". Note that even for 'text/xml', UTF-8 is not the default. As defined in section 3.1 of RFC 3023 [4], the default charset value for the 'text/xml' media type is US-ASCII. Both RFC 2854 and RFC 3023 recommend UTF-8 as a recommended (not a default) value, but more importantly, both RFC *strongly* recommend to add an explicit charset parameter to avoid confusion. [1] http://www.rfc-editor.org/rfc/rfc2854.txt [2] http://www.rfc-editor.org/rfc/rfc2616.txt [3] http://www.w3.org/TR/html4/charset.html#h-5.2.2 [4] http://www.rfc-editor.org/rfc/rfc3023.txt Regards, -- Masayasu Ishikawa / mimasa@w3.org W3C - World Wide Web Consortium
Received on Friday, 7 December 2001 07:55:37 UTC