- From: Bertilo Wennergren <bertilow@hem.passagen.se>
- Date: Wed, 4 Oct 2000 22:43:52 +0200
- To: "Masayasu Ishikawa" <mimasa@w3.org>, <www-validator@w3.org>, <XHTML-L@egroups.com>
Masayasu Ishikawa (in <www-validator@w3.org>): > While the validator's script needs to be updated to handle UTF-16 > correctly (nsgmls can handle UTF-16 if it is configured appropriately, > and indeed the above page is validated if I run nsgmls locally), you > have to configure your Web server to add a correct charset parameter > to the Content-Type HTTP response header, i.e. > > Content-Type: text/html; charset=UTF-16 > > Your server only sends > > Content-Type: text/html > > and it cannot be handled correctly even if the validator can handle > UTF-16. RFC 2616, "3.7.1 Canonicalization and Text Defaults" says: > > The "charset" parameter is used with some media types to define the > character set (section 3.4) of the data. When no explicit charset > parameter is provided by the sender, media subtypes of the "text" > type are defined to have a default charset value of "ISO-8859-1" when > received via HTTP. Data in character sets other than "ISO-8859-1" or > its subsets MUST be labeled with an appropriate charset value. See > section 3.4.1 for compatibility problems. > > cf. http://www.ietf.org/rfc/rfc2616.txt What about XHTML (and other XML document types)? According to XML rules such a doc, without an explicit encoding declaration, should be taken as UTF-8 or UTF-16 (automatically detected). Do we have a clash between two different rule sets here? Does it matter if XHTML is served as "text/xml" or "text/html"? Would the rules for encodings, http versus in-doc declarations, be different? If the http charset parameter says one thing, and the in-doc declaration says another thing, which one should take precedence? According to the XHTML spec encoding info in an XML declaration takes precedence over meta-element charset info, but does it win over true http charset info as well? The current practice is to let meta charset info win over true http charset info, which might be in violation of the rules. This is confusing already. Bringing in XML declarations (and the default encoding when there is no XML declaration, or when there is no encoding attribute in the XML declaration) makes this even more confusing. I've been wondering about this for a long time. I'd like to find clear rules based on understandable logic, but I haven't found that yet. Any hope? ##################################################################### Bertilo Wennergren <http://purl.oclc.org/net/bertilo> <bertilow@hem.passagen.se> #####################################################################
Received on Wednesday, 4 October 2000 16:44:06 UTC