- From: Richard Ishida <ishida@w3.org>
- Date: Mon, 21 Jul 2003 16:37:05 +0100
- To: "'Masayasu Ishikawa'" <mimasa@w3.org>, <Steven.Pemberton@cwi.nl>
- Cc: "Richard Ishida" <ishida@w3.org>, "'Martin Duerst'" <duerst@w3.org>, <public-i18n-geo@w3.org>
Hello Masayasu, Steven, In the GEO Guidelines we are trying to clarify how charset information should be declared for international html and xhtml 1.0 pages (probably only served as text/html, though that hasn't yet been discussed). Here is some of the baseline information I'm working from. Speaking of the XML declaration, the XHTML 1.0 spec says "Such a declaration is required when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding was determined by a higher-level protocol." I am assuming that a 'higher-level protocol' includes the server sending charset information in the http header. But I am not clear whether this includes any meta charset declaration. I assume that the meta statement does work in nearly all user agents for xhtml served as text/html. And later, when talking about HTML compatability... "For compatibility with these types of legacy browsers, you may want to avoid using processing instructions and XML declarations. Remember, however, that when the XML declaration is not included in a document, the document can only use the default character encodings UTF-8 or UTF-16." The two statements from the spec don't seem to be completely aligned - the second is much more restrictive than the first because it doesn't make allowances for non-utf-8 declarations in higher level protocols. I'm assuming that this is an error, and if I'm right, proposing that this should be addressed as an erratum. I note in addition, it seems the accepted wisdom amongst content authors that, because the xml declaration/prolog causes some browsers to fail to display content and causes modern versions of IE to display in quirks mode, it is best omitted when serving xhtml content as text/html, - even when coding for web standards compliance. See, eg http://www.webstandards.org/learn/reference/prolog_problems.html Later in the specification it says: "In order to portably present documents with specific character encodings, the best approach is to ensure that the web server provides the correct headers. If this is not possible, a document that wants to set its character encoding explicitly must include both the XML declaration an encoding declaration and a meta http-equiv statement (e.g., <meta http-equiv="Content-type" content="text/html; charset=EUC-JP" />)" In GEO's guidelines http://www.w3.org/International/geo/html-tech/#ri20030218.131104811 we want to point the way to better standards compliance and future-proof approaches for our readers, but also take into account the realities of current authoring techniques so as not to alienate them. What I get from the above for people who want to serve non-ascii characters on a text/html xhtml1 page is: 1. serve the charset information in the http header whenever possible 2. if the browser cannot get the charset info from the http header you must (and even if it can you probably should) (a) use both xml declaration and meta declaration, or (b) serve as utf-8/16 3. if you don't want to use the xml declaration but can serve the charset info in the http header you can use whatever encoding you like 4. if you don't want to use the xml declaration and don't serve the charset info in the http header you must only use utf-8/16 I have a feeling that actually you could use the meta statement to get around 4, couldn't you? I also suspect that you should still use the meta declaration for 3. If you simply serve text without any http header or declaratory information, but as utf-8, I'm not sure all user agents will actually recognise the encoding, will they? I'm currently inclined to simply recommend the following for xhtml1/html: A. serve the charset information in the http header whenever possible/practical B. for xhtml1 (served as text/html) use the xml declaration unless you have a good reason not to (eg. To avoid quirks mode on IE) C. always use the meta charset declaration Could you please give us some pointers on the above to help us develop set of practical recommendations for the guidelines? Thanks, RI ============ Richard Ishida W3C tel: +44 1753 480 292 http://www.w3.org/International/ http://www.w3.org/People/Ishida/
Received on Monday, 21 July 2003 11:37:38 UTC