- From: Kurosaka, Teruhiko <Teruhiko.Kurosaka@iona.com>
- Date: Wed, 14 May 2003 11:38:06 -0700
- To: "Public-I18n-Ws (E-mail)" <public-i18n-ws@w3.org>
A few weeks ago, I sent out a question to www-international@w3.org under this subject. The question was, when sending out XML over HTTP, whether it is legal to put a different encoding in HTTP Content-Type; charset= thatn that in the encoding attribute of the XML declaration, and if so, which encoding should be applied in interpreting the XML packet. To this posting, Chris Lilley <mailto: chris@w3.org>replied, which I quote in the bottom. He essentially says (1) Best practice is to use the media type application/xml without charset attribute (2) Currently, having conflicting declarations is legal and the charset declared in HTTP header should be used. (3) He agrees this is a bad practice should be prohibited. I wonder if the member of WS Task Force agree with this opinion, and if we need to take any further action. Quotes from Chris' reply follow: ---------------------------------------------------------------------- KT> (1) Is this legal? Unfortunately yes. Its a really bad idea, because the message immediately becomes not well formed as soon as the http headers go away. KT> (2) If it is legal, which declaration is supposed to wins? I.e. should KT> the contents be in UTF-8 encoding or Shift_JIS encoding in this KT> example? The http headers win. ---------------------------------------------------------------------- KT> Could you be so kind to quote the relevant sections of XML and HTTP spec ? KT> XML spec does not seem to address this situation to me. The XML spec defers to the mime registration for the XML media type. http://www.ietf.org/rfc/rfc3023.txt It is in that specification that the precedence is defined (and other unfortunate things, such as a mandatory default of US-ASCII when no charset is provided in the HTTP, regardless of what the XML encoding declaration says). This is very bad. As a member of the TAG I find this very broken, architecturally speaking. Tim Bray agrees, and I have proposed wording in the architecture document that spells this out. There are know problems with charset in the text/* media types, such as a mandatory fallback to text/plain;charset=us-ascii. The solution is to deprecate text/xml and have a charset-free application/xml, using the nicely defined xml mechanism to declare the encoding in all circumstances, rather than dragging the problems from text/* into the hitherto unaffected other media types. KT> Anyway, shouldn't this practice be explicitly forbidden for any types of KT> contents (HTML, XML etc.) that have their own mechanism of encoding KT> identification? Yes, of course it should. I am glad that you agree ---------------------------------------------------------------------- ---- T. "Kuro" Kurosaka, Internationalization Architect IONA Technologies, Santa Clara, CA USA / +1 408 350-9684 Visit i18n.iona.com for up-to-date i18n information. (IONA Internal)
Received on Wednesday, 14 May 2003 14:38:13 UTC