- From: Addison Phillips [wM] <aphillips@webmethods.com>
- Date: Mon, 22 Sep 2003 16:50:02 -0400
- To: "WWW-Tag" <www-tag@w3.org>, ietf-xml-mime@imc.org
- Cc: "Addison Phillips" <aphillips@webmethods.com>
Hi Martin, et al, Having tracked the thread down and having read it I feel like I can contribute something to it. This is a common and not very fun problem that our implementations encounter frequently with XML documents transmitted to our system by other products. SOAP 1.2 recommends the use of application/soap+xml as the media type (although it is not required, see section 7.1.4 of [SOAP12-PART2], it is pretty close to a requirement for HTTP). Noah is correct that charset is optional. In the absence of charset, the application/*xml types default to the encoding embedded in the XML document itself, which I think is generally seen to be the desirable way to go. Various SOAP implementations less than 1.2 use various media types, including text/xml, depending on the transport, etc. The problem with changing rfc3023 is that there are a number of implementations out there that adhere to the exact letter of the involved RFCs (3023/2045/2046/etc.). I seem to recall that there are implementations that require the charset parameter or which forceably filter the data to ASCII (converting all 8th-bit bytes to the '?' character) and thus there are many implementations that, to get the right results with these, forceably emit charset parameters. Therefore, unless absolutely forbidden, implementations would still have to support the use of charset with both media types. And I don't see how we can forbid the use of the charset parameter given the need for need for interoperability with extant sensitive systems. It would be nice if text/xml could be modified, since it is quite common to get un-charset-labeled content that really is NOT US-ASCII. Since one can always detect that a data stream is not US-ASCII, it has always seemed a bit odd to me that the RFCs require the data to be destroyed when there is clear evidence that one is losing something. I understand the reasoning, but I think there is a difference between saying that omission of a charset parameter invites data corruption (e.g. the MIME or XML processor is not required to look at the XML content and thus MAY use US-ASCII to interpret the data) and one that insists on it (e.g. the MIME or XML processor is required to interpret the data using US-ASCII-7). Perhaps we should focus on the semantics of charset not being present, instead of focusing on forbidding/requiring charset itself. Consider this paragraph of RFC3023: <snip> Conformant with [RFC2046], if a text/xml entity is received with the charset parameter omitted, MIME processors and XML processors MUST use the default charset value of "us-ascii"[ASCII]. In cases where the XML MIME entity is transmitted via HTTP, the default charset value is still "us-ascii". </snip> This could be changed to something more friendly, like: <snip> If a text/xml entity is received with the charset parameter omitted, MIME processors and XML processors MAY attempt to detect the charset from the XML content itself. Such detection MUST follow the requirements of section 4.3.3 of [XML]. MIME and XML processors that do not attempt or are unable to detect the charset using this requirement must use US-ASCII (or UTF-8????)... etc. and so forth... </snip> This allows receivers the leeway to detect errant senders (while leaving errant senders of text/xml as non-conforming). This seems like a reasonable compromise to me. [SOAP12-PART2] http://www.w3.org/TR/2003/PR-soap12-part2-20030507 Just my two cents. Best Regards, Addison Addison P. Phillips Director, Globalization Architecture webMethods | Delivering Global Business Visibility 432 Lakeside Drive, Sunnyvale, CA, USA +1 408.962.5487 (office) +1 408.210.3569 (mobile) mailto:aphillips@webmethods.com Chair, W3C-I18N-WG, Web Services Task Force http://www.w3.org/International/ws Internationalization is an architecture. It is not a feature. > -----Original Message----- > From: Martin Duerst [mailto:duerst@w3.org] > Sent: Friday, September 19, 2003 11:05 AM > To: Addison Phillips > Subject: Fwd: Re: Requesting a revision of RFC3023 > > > Hello Addison, > > This is from two lists (ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>). > Re SOAP, I guess you might have some answer. If yes, can you send it to > those lists or to me for forwarding? > > Regards, Martin. > > > >To: MURATA Makoto <murata@hokkaido.email.ne.jp> > >Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org> > >Subject: Re: Requesting a revision of RFC3023 > >From: noah_mendelsohn@us.ibm.com > >Date: Fri, 19 Sep 2003 11:04:11 -0400 > >Sender: owner-ietf-xml-mime@mail.imc.org > >List-Archive: <http://www.imc.org/ietf-xml-mime/mail-archive/> > >List-ID: <ietf-xml-mime.imc.org> > >List-Unsubscribe: <mailto:ietf-xml-mime-request@imc.org?body=unsubscribe> > > >Murata Makoto writes: > > > > >> I believe that SOAP implementations use the > > >> charset parameter. If we remove the charset > > >> parameter, we will make them non-conformant. > > > >This is not my area of expertise, but I note that the HTTP binding [1] > >provided by SOAP 1.2 Recommendation uses an application/soap+xml media > >type, a definition of which is at [2] (I believe it is working its way > >through the formal registration process.) My reading is that the > >definition lists charset as optional, and makes clear that its proper use > >is to be found in RFC 3023. > > > >I am not aware of what typical implementations of SOAP 1.1 or > SOAP 1.2 are > >doing, but the 1.2 spec at least seems to list it as optional. > Again, I'm > >not expert in this stuff and am not offering an opinion, but I > thought the > >links might be helpful. > > > >[1]http://www.w3.org/TR/soap12-part2/#soapinhttp > >[2] http://www.w3.org/TR/soap12-part2/#ietf-draft > > > >------------------------------------------------------------------ > >Noah Mendelsohn Voice: 1-617-693-4036 > >IBM Corporation Fax: 1-617-693-8676 > >One Rogers Street > >Cambridge, MA 02142 > >------------------------------------------------------------------ > > > > > > > > > > > > > > > >MURATA Makoto <murata@hokkaido.email.ne.jp> > >Sent by: www-tag-request@w3.org > >09/19/03 08:10 AM > > > > > > To: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org> > > cc: (bcc: Noah Mendelsohn/Cambridge/IBM) > > Subject: Re: Requesting a revision of RFC3023 > > > > > > > > > >On Fri, 19 Sep 2003 03:50:11 +0200 > >Bjoern Hoehrmann <derhoermi@gmx.net> wrote: > > > > > You want to change something that has been STRONGLY > RECOMMENDED for over > > > five years to (ideally) MUST NOT just because it could cause trouble > > > when used improperly or with broken implementations. Today I am good > > > with web standards if I use the charset parameter, tommorow I am bad > > > with web standards if I do. What's next on #W3C? Use tables for layout > > > because people could get CSS wrong and old browsers get some > CSS wrong? > > > I don't think this leads anywhere. > > > >I believe that SOAP implementations use the charset parameter. If we > >remove the charset parameter, we will make them non-conformant. > > > >Cheers, > > > >-- > >MURATA Makoto <murata@hokkaido.email.ne.jp> > > > >
Received on Monday, 22 September 2003 16:52:57 UTC