- From: Masayasu Ishikawa <mimasa@w3.org>
- Date: Mon, 04 Mar 2002 11:30:22 +0900 (JST)
- To: www-validator@w3.org
"Sean B. Palmer" <sean@mysterylights.com> wrote: > The W3C Validator begs for a character encoding even when the page in > question is being validated as XML, and the XML declaration is > missing. According to the XML specification, if a declaration is > missing, then the encoding is either UTF-8 (and possibly its subset, > US-ASCII) or UTF-16. That depends on the media type used. The above rule in XML 1.0 only applies "in the absence of information provided by an external transport protocol (e.g. HTTP or MIME)". According to RFC 3023, if an entity is received with the charset parameter omitted, the default charset value is "us-ascii" in the case of "text/xml", and there's no default value in the case of "application/xml" (thus the default rule in XML 1.0 applies). In both cases, the charset parameter in the HTTP Content-Type response header takes precedence. > I know that there is much room for debate in this area (given section > 6 in RFC 2854)... but it seems to me that the validator should be able > to gague the character encoding of an XHTML document without an XML > declaration. If you serve an XHTML document as "text/html", then I strongly recommend to never rely on the default rule (which is extremely messy) and always provide an explicit charset information. As a side note, it seems erroneous for the validator to NOT report well-formedness error when a UTF-8 document that does include characters above Basic Latin range is served as "text/xml" without an explicit charset parameter. Regards, -- Masayasu Ishikawa / mimasa@w3.org W3C - World Wide Web Consortium
Received on Sunday, 3 March 2002 21:30:27 UTC