W3C home > Mailing lists > Public > www-international@w3.org > April to June 2003

RE: Can HTTP content-type charset disagree with its contents XML encoding?

From: Kurosaka, Teruhiko <Teruhiko.Kurosaka@iona.com>
Date: Tue, 29 Apr 2003 10:35:53 -0700
Message-ID: <D4A5CCF30A322D4A80FCF05A8BAC8D7562A3A0@AMERWEST-EMS1.IONAGLOBAL.COM>
To: "Chris Lilley" <chris@w3.org>
Cc: "Www-International (E-mail)" <www-international@w3.org>

Chris,
Thank you for your reply.

Could you be so kind to quote the relevant sections of XML and HTTP spec ?
XML spec does not seem to address this situation to me.  

Anyway, shouldn't this practice be explicitly forbidden for any types of
contents (HTML, XML etc.) that have their own mechanism of encoding 
identification?
-kuro


> -----Original Message-----
> From: Chris Lilley [mailto:chris@w3.org]
> KT> I came across an article that shows an example of a SOAP message
> KT> in its List 1:
> KT> http://www.atmarkit.co.jp/fxml/tanpatsu/21websvc/websvc02.html
> KT> (The article is in Japanese but the List 1 contains only 
> ASCII text
> KT> except one line within <m:GoodsName> element.)
> 
> KT> In this example, the HTTP level header says the contents is
> KT> in UTF-8:
> 
> KT> Content-Type: application/soap-xml; charset="utf-8"
> 
> KT> But the XML document which is the contents of this HTTP request
> KT> claims that the contents is in Shift_JIS as in:
> KT>  <?xml version="1.0" encoding="shift_jis"?>
> 
> KT> I am puzzled.  Does anyone know:
> 
> KT> (1) Is this legal?
> 
> Unfortunately yes. Its a really bad idea, because the message
> immediately becomes not well formed as soon as the http headers go
> away.
> 
> KT> (2) If it is legal, which declaration is supposed to 
> wins? I.e. should
> KT> the contents be in UTF-8 encoding or Shift_JIS encoding in this
> KT> example?
> 
> The http headers win.
> 
> The only corner case where this sort of thing could be generated is
> when an xml-unaware program has converted the encoding from one to
> another, and somehow knows enough to convey this to the server in some
> undocumented, server defined way but does not know enough to convey it
> to the xml processors in a documented, well defined way by updating
> the encoding declaration.
> 
> And to support this use case, the HTTP headers are defined to override
> the encoding declaration in the XML.
> 
> Not so much of a problem with transient, over the wire information
> such as SOAP messages, but much more of a problem for other, longer
> lived xml information which is frequently processed on the server
> side, from the local filestore, and also processed on the client
> side, for example saved and looked at later. In both these situations
> there is no http header information and the self-describing nature of
> XML is compromised - the XML is not well formed!
> 
> Of course, the correct solution is to not put duplicate and
> contradictory encoding information in the http headers, but rather to
> say that programs which make xml content not well formed are 
> broken and
> should be fixed.
Received on Tuesday, 29 April 2003 13:36:12 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:00 GMT