- From: Addison Phillips [wM] <aphillips@webmethods.com>
- Date: Thu, 29 May 2003 17:25:36 -0700
- To: "Kurosaka, Teruhiko" <Teruhiko.Kurosaka@iona.com>, "Public-I18n-Ws \(E-mail\)" <public-i18n-ws@w3.org>
Hi Kuro, First, the intent is that all SOAP messages should be encoded using UTF-8 or UTF-16. That doesn't mean that they actually are or that utf-8/utf-16 are the only valid values here. You *can* create SOAP messages in another encoding. The application/soap+xml type is defined here: http://www.w3.org/TR/2003/PR-soap12-part2-20030507/#ietf-draft Which refers to here: http://www.ietf.org/rfc/rfc3023.txt (see Page 9). Which says (and I quote): <quot> Although listed as an optional parameter, the use of the charset parameter is STRONGLY RECOMMENDED, since this information can be used by XML processors to determine authoritatively the charset of the XML MIME entity. The charset parameter can also be used to provide protocol-specific operations, such as charset-based content negotiation in HTTP. "utf-8" [RFC2279] and "utf-16" [RFC2781] are the recommended values, representing the UTF-8 and UTF-16 charsets, respectively. These charsets are preferred since they are supported by all conforming processors of [XML]. If an application/xml entity is received where the charset parameter is omitted, no information is being provided about the charset by the MIME Content-Type header. Conforming XML processors MUST follow the requirements in section 4.3.3 of [XML] that directly address this contingency. However, MIME processors that are not XML processors SHOULD NOT assume a default charset if the charset parameter is omitted from an application/xml entity. </quot> It is a bad idea generally to create SOAP documents that don't use Unicode because there is no guarantee that the SOAP Processor supports anything but Unicode. The Content-Type of application/soap+xml is NOT a text/* type for a reason. For one thing it gets you out of the ASCII default. A Content-Transfer-Encoding is generally applied to data external to the actual content encoding and content processing. You are probably familiar with MIME, which is used for transporting non-ASCII data. Generally speaking in both HTTP and SMTP, the actual SOAP document arrives as an attachment and is MIME encoded. You don't have to deal with this, as it is part of the transport stack. Best Regards, Addison > -----Original Message----- > From: public-i18n-ws-request@w3.org > [mailto:public-i18n-ws-request@w3.org]On Behalf Of Kurosaka, Teruhiko > Sent: Thursday, May 29, 2003 4:44 PM > To: Public-I18n-Ws (E-mail) > Subject: charset use in SOAP 1.2 > > > > I'm browsing the final spec of SOAP 1.2 Primer > http://www.w3.org/TR/2003/PR-soap12-part0-20030507/ > and I have two questions. > > Near > http://www.w3.org/TR/2003/PR-soap12-part0-20030507/#L26866 > (thank you, Addision, for the pointer) > it reads: > > > When placing SOAP messages in HTTP bodies, the HTTP > Content-type header > > must be chosen as "application/soap+xml". (The optional > charset parameter, > > which can take the value of "utf-8" or "utf-16", is shown in > this example, > > but if it is absent the character set rules for freestanding > [XML 1.0] apply to > > the body of the HTTP request.) > > The note in the paren seems to be saying that only UTF-8 and UTF-16 > are the valid encodings that can be used in SOAP (over HTTP). Does anyone > know if this is their intent, and if so, why? > > > On the other hand, SOAP-over-SMTP examples > http://www.w3.org/TR/2003/PR-soap12-part0-20030507/#Example14 > http://www.w3.org/TR/2003/PR-soap12-part0-20030507/#Example31 > (This is actually for Exampl 15.) > do not use the charset attribute in the Content-Type header. According to > Section 5.2 of RFC 2045 > http://www.ietf.org/rfc/rfc2045.txt?number=2045 > lack of charset info means US-ASCII. Perhaps one can argue that > the US-ASCII > default only applies to text/* content, and application/soap+xml > has a different rule, > which I may have overlooked. > But I also notice that both examples lack a > Content-Transfer-Encoding header. > Section 6.1 of RFC 2045 also mentions that lack of > Content-Transfer-Encoding > header means 7-bit channel. Yet the body part in both examples contains > non-ASCII characters in the <n:name> element, which probably requires 8bit > channel or some sort of transfer encoding. > > Is this an over-sight that we should point out, or am I misunderstanding > something? > > > T. "Kuro" Kurosaka > Internationalization Architect > teruhiko.kurosaka@iona.com > ------------------------------------------------------- > IONA Technologies > 2350 Mission College Blvd. Suite 650 > Santa Clara, CA 95054 > Tel: (408) 350 9684/9500 > Fax: (408) 350 9501 > ------------------------------------------------------- > Making Software Work Together TM > >
Received on Thursday, 29 May 2003 20:25:38 UTC