Revised (actually replaced is a better word) I-001, I-002

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
       "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title>No title</title>
  <meta name="generator" content="amaya 8.1a, see http://www.w3.org/Amaya/" />
</head>

<body>

<div class="div3">
<h4><a id="S-001"></a></h4>

<h3>Unicode and Legacy Encodings in SOAP Transactions.</h3>

<p>SOAP transactions rely on being able to exchange data in a consistent,
mutually understandable way. The character encoding of the SOAP message and
the communication of the encoding between senders and receivers enable this
to occur reliably. Because all XML <a href="#XML">[XML]</a> processors must
be able to read entities in both the UTF-8 <a href="#RFC2279">[RFC2279]</a>
and UTF-16 <a href="#RFC2781">[RFC2781]</a> encodings, using UTF-8 or UTF-16
guarantees character encoding interoperability on the SOAP layer. The
Character Model for the World Wide Web <a href="#CHARMOD">[CHARMOD]</a>
document describes these considerations and guidelines.</p>

<p>If you are using SOAP 1.1 and the Content-Type text/xml, then the charset
parameter MUST be supplied in order to ensure correct interoperability,
because the default for text/xml is us-ascii. If you are using SOAP 1.2, then
the Content-Type signature is application/soap+xml. If the charset parameter
for that media type is omitted using application/soap+xml then the SOAP
document will be examined for encoding using the rules provided in XML. In
all cases the charset parameter in the media type takes precedence over that
of the XML that forms the SOAP document. Please refer to RFC3023, XML 1.0,
and RFC2045/2046 for more information.</p>

<p><strong>Scenario C:</strong> A SOAP Sender sends a legacy (non-Unicode)
encoded request which the receiver doesn't support. The SOAP processor should
fail and may return a fault.</p>

<p><strong>Scenario D:</strong> A SOAP processor receives and processes a
request and returns a result. The response is encoded uses a character
encoding not supported by the original Sender. The Sender will not be able to
process the response. This is an unrecoverable error. SOAP users should agree
in advance on the collection of encodings that will be used in the
transactions. Ideally all transactions will use a Unicode encoding, such as
UTF-8, since all XML processors are required to handle this encoding.</p>

<p><strong>Scenario E:</strong> Some encodings have more characters than are
included in Unicode or use Private Use characters. SOAP messages sent using
these problematic characters may result in transient failure or odd results.
These characters should be avoided wherever possible or mutual agreement on
the charset should be used.</p>

<p><strong>Scenario F:</strong> Processor receives a SOAP message whose
encoding declaration doesn't match its actual encoding. The processor should
fail (according to the rules in RFC3023 and in XML) and may return a
fault.</p>

<p><strong>Scenario G:</strong> Processort receives and processes a SOAP
message. The processor invokes an agent (the actual service), which uses a
legacy encoding. Data may be lost or corrupted by the transcoding process
between the receiving SOAP processor and the agent. The transaction may seem
to succeed, even though the data is corrupted.</p>

<p><em>Example G:</em> A Web service for "insert new record" is created for a
relational database using Latin-1 as an encoding. The new record sent by the
sender contains all kanji characters. The invocation of the service succeeds,
even though all of the kanji characters are converted to the substitution
character (generally a ?). The failure may not be detectable except by
inspecting the resulting data.</p>
</div>

<div class="div3">
<p>Note that the XML Japanese Profile <a href="#XML-JP">[XML-JP]</a>
describes that using legacy encodings such as Shift_JIS cannot provide
complete interoperability ininformation interchange; there are differences
among platforms in the mapping tables they use for this and similar
encodings.</p>

<p></p>
</div>

<p></p>
</body>
</html>

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility

432 Lakeside Drive, Sunnyvale, CA, USA
+1 408.962.5487 (office)  +1 408.210.3569 (mobile)
mailto:aphillips@webmethods.com

Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International/ws 

Internationalization is an architecture. 
It is not a feature.

Received on Wednesday, 8 October 2003 13:57:13 UTC