- From: <Peter_Constable@sil.org>
- Date: Sun, 3 Nov 2002 06:34:35 -0600
- To: www-international@w3.org
I'm curious about this extract from an appendix to the XHTML spec: <quote> C.9. Character Encoding Historically, the character encoding of an HTML document is either specified by a web server via the charset parameter of the HTTP Content-Type header, or via a meta element in the document itself. In an XML document, the character encoding of the document is specified on the XML declaration (e.g., <?xml version="1.0" encoding="EUC-JP"?>). In order to portably present documents with specific character encodings, the best approach is to ensure that the web server provides the correct headers. If this is not possible, a document that wants to set its character encoding explicitly must include both the XML declaration an encoding declaration and a meta http-equiv statement (e.g., <meta http-equiv="Content-type" content="text/html; charset=EUC-JP" />). In XHTML-conforming user agents, the value of the encoding declaration of the XML declaration takes precedence. </quote> This is said to be informative, yet the quoted text says, "...a document that wants to set its character encoding explicitly *must* include both the XML declaration an encoding declaration and a meta http-equiv statement..." (emphasis added). How can an informative portion of the document say that something *must* be done? The bigger question is what really should or does happen. This issue was brought to my attention when I discovered that IE 6 would not interpret a certain xhtml doc in terms of UTF-8 unless we added the http-equiv statement, even though UTF-8 was explicitly declared as the encoding in the XML declaration. (It was assuming either 8859-1 or cp1252, I forget which.) It seems to me that this was a bug on the part of IE -- if it's interpreting an XML doc, it should pay attention to the encoding declared in the XML declaration. In general, it seems to me that stronger statements should be made in the spec: XHTML is an XML application, and thus user agents must conform to the XML spec, implying that an encoding specified in the XML declaration *must* be observed -- and that this statement can be made normatively rather than just informatively. Am I missing something? Or is this being worked on further in the draft for version 2? - Peter --------------------------------------------------------------------------- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <peter_constable@sil.org>
Received on Sunday, 3 November 2002 07:36:02 UTC