- From: Chris Lilley <chris@w3.org>
- Date: Tue, 29 Apr 2003 13:47:43 +0200
- To: "Kurosaka, Teruhiko" <Teruhiko.Kurosaka@iona.com>
- CC: "Www-International (E-mail)" <www-international@w3.org>
On Tuesday, April 29, 2003, 3:24:57 AM, Teruhiko wrote: KT> I came across an article that shows an example of a SOAP message KT> in its List 1: KT> http://www.atmarkit.co.jp/fxml/tanpatsu/21websvc/websvc02.html KT> (The article is in Japanese but the List 1 contains only ASCII text KT> except one line within <m:GoodsName> element.) KT> In this example, the HTTP level header says the contents is KT> in UTF-8: KT> Content-Type: application/soap-xml; charset="utf-8" KT> But the XML document which is the contents of this HTTP request KT> claims that the contents is in Shift_JIS as in: KT> <?xml version="1.0" encoding="shift_jis"?> KT> I am puzzled. Does anyone know: KT> (1) Is this legal? Unfortunately yes. Its a really bad idea, because the message immediately becomes not well formed as soon as the http headers go away. KT> (2) If it is legal, which declaration is supposed to wins? I.e. should KT> the contents be in UTF-8 encoding or Shift_JIS encoding in this KT> example? The http headers win. The only corner case where this sort of thing could be generated is when an xml-unaware program has converted the encoding from one to another, and somehow knows enough to convey this to the server in some undocumented, server defined way but does not know enough to convey it to the xml processors in a documented, well defined way by updating the encoding declaration. And to support this use case, the HTTP headers are defined to override the encoding declaration in the XML. Not so much of a problem with transient, over the wire information such as SOAP messages, but much more of a problem for other, longer lived xml information which is frequently processed on the server side, from the local filestore, and also processed on the client side, for example saved and looked at later. In both these situations there is no http header information and the self-describing nature of XML is compromised - the XML is not well formed! Of course, the correct solution is to not put duplicate and contradictory encoding information in the http headers, but rather to say that programs which make xml content not well formed are broken and should be fixed. KT> T. "Kuro" Kurosaka KT> Internationalization Architect KT> teruhiko.kurosaka@iona.com KT> ------------------------------------------------------- KT> IONA Technologies KT> 2350 Mission College Blvd. Suite 650 KT> Santa Clara, CA 95054 KT> Tel: (408) 350 9684/9500 KT> Fax: (408) 350 9501 KT> ------------------------------------------------------- KT> Making Software Work Together TM -- Chris mailto:chris@w3.org
Received on Tuesday, 29 April 2003 07:48:08 UTC