- From: Addison Phillips [wM] <aphillips@webmethods.com>
- Date: Wed, 14 May 2003 15:22:42 -0400
- To: "Kurosaka, Teruhiko" <Teruhiko.Kurosaka@iona.com>, "Public-I18n-Ws \(E-mail\)" <public-i18n-ws@w3.org>
Hi Kuro, We can discuss it at our next meeting, although I would suggest that W3C-I18N Core TF would probably be a more appropriate venue. Speaking strictly for myself, I agree that this is really and truly broken as a design. In point of fact, I'm not sure that the HTTP header actually wins in practice, since in at least some cases the XML parser/processor gets the bytestream separate from the HTTP transfer mechanism. Mis- or unlabeled content that isn't pre-converted from a bytestream to a character representation survives this and then probably the XML declaration "wins". Of course, the fact that they conflict at all is a problem. In terms of Web services, though, this isn't generally a problem. The media type for a SOAP message is commonly 'application/soap+xml'. I quote from the SOAP 1.2 Primer: When placing SOAP messages in HTTP bodies, the HTTP Content-type header must be chosen as "application/soap+xml". (The optional charset parameter, which can take the value of "utf-8" or "utf-16", is shown in this example, but if it is absent the character set rules for freestanding [XML 1.0] apply to the body of the HTTP request.) Which is located here: http://www.w3.org/TR/2003/PR-soap12-part0-20030507/#L26866 Best Regards, Addison Addison P. Phillips Director, Globalization Architecture webMethods, Inc. +1 408.962.5487 (phone) +1 408.210.3569 (mobile) ------------------------------------------------- Internationalization is an architecture. It is not a feature. Chair, W3C-I18N-WG Web Services Task Force To participate see http://www.w3.org/International/ws > -----Original Message----- > From: public-i18n-ws-request@w3.org > [mailto:public-i18n-ws-request@w3.org]On Behalf Of Kurosaka, Teruhiko > Sent: Wednesday, May 14, 2003 2:38 PM > To: Public-I18n-Ws (E-mail) > Subject: Re: Can HTTP content-type charset disagree with its > contents XML encoding? > > > > A few weeks ago, I sent out a question to www-international@w3.org > under this subject. The question was, when sending out XML over HTTP, > whether it is legal to put a different encoding in HTTP > Content-Type; charset= > thatn that in the encoding attribute of the XML declaration, and > if so, which > encoding should be applied in interpreting the XML packet. > > To this posting, Chris Lilley <mailto: chris@w3.org>replied, > which I quote in the bottom. > He essentially says > (1) Best practice is to use the media type application/xml > without charset attribute > (2) Currently, having conflicting declarations is legal and the > charset declared in HTTP > header should be used. > (3) He agrees this is a bad practice should be prohibited. > > I wonder if the member of WS Task Force agree with this opinion, > and if we need to take any further action. > > > Quotes from Chris' reply follow: > ---------------------------------------------------------------------- > KT> (1) Is this legal? > > Unfortunately yes. Its a really bad idea, because the message > immediately becomes not well formed as soon as the http headers go > away. > > KT> (2) If it is legal, which declaration is supposed to wins? I.e. should > KT> the contents be in UTF-8 encoding or Shift_JIS encoding in this > KT> example? > > The http headers win. > ---------------------------------------------------------------------- > KT> Could you be so kind to quote the relevant sections of XML > and HTTP spec ? > KT> XML spec does not seem to address this situation to me. > > The XML spec defers to the mime registration for the XML media type. > http://www.ietf.org/rfc/rfc3023.txt > It is in that specification that the precedence is defined (and other > unfortunate things, such as a mandatory default of US-ASCII when no > charset is provided in the HTTP, regardless of what the XML encoding > declaration says). > > This is very bad. As a member of the TAG I find this very broken, > architecturally speaking. Tim Bray agrees, and I have proposed wording > in the architecture document that spells this out. > > There are know problems with charset in the text/* media types, such > as a mandatory fallback to text/plain;charset=us-ascii. The solution > is to deprecate text/xml and have a charset-free application/xml, > using the nicely defined xml mechanism to declare the encoding in all > circumstances, rather than dragging the problems from text/* into the > hitherto unaffected other media types. > > KT> Anyway, shouldn't this practice be explicitly forbidden for > any types of > KT> contents (HTML, XML etc.) that have their own mechanism of encoding > KT> identification? > > Yes, of course it should. I am glad that you agree > ---------------------------------------------------------------------- > > ---- > T. "Kuro" Kurosaka, Internationalization Architect > IONA Technologies, Santa Clara, CA USA / +1 408 350-9684 > Visit i18n.iona.com for up-to-date i18n information. (IONA Internal)
Received on Wednesday, 14 May 2003 15:22:46 UTC