RE: encoding and charset from Toman_Vojtech@emc.com on 2009-12-11 (public-xml-processing-model-wg@w3.org from December 2009)

From: <Toman_Vojtech@emc.com>
Date: Fri, 11 Dec 2009 08:51:06 -0500
To: <public-xml-processing-model-wg@w3.org>
Message-ID: <997C307BEB90984EBE935699389EC41C4F3AEE@CORPUSMX70C.corp.emc.com>

Norm,

Regarding your question about encoding/missing charset (as discussed in
the last confcall), I also wonder whether the new multipart tests are
correct.

The tests specify the multipart bodies for making the request like this:

<c:body content-type="text/plain" encoding="utf-8" description="Some
descriptive text">Hello World</c:body>

I wonder if using utf-8 in the @encoding attribute makes any sense. As
far as I understand, c:body/@encoding controls how to decode the c:body
data *before* formulating the request, and not which encoding to use
when sending the data. Section 7.1.10.2 says:

"The encoding attribute controls the decoding of the element content for
formulating the body. A value of base64 indicates the element's content
is a base64 encoded string whose byte stream should be sent as the
message body"

So, if I understand the above correctly, I don't see how specifying
utf-8 as the encoding can work. I mean, c:body always contains a
sequence of characters, and these characters have already been decoded
(by the parser) using the encoding of the owner XML document. What would
be the meaning of:

<?xml version="1.0" encoding="iso-8859-1"?>
...
<c:body content-type="text/plain" encoding="utf-8">Hello World</c:body>
...

?

Regards,
Vojtech

Received on Friday, 11 December 2009 13:51:49 UTC