encoding, charset, and SOAP

On: http://lists.w3.org/Archives/Public/www-forms/2007Mar/0060.html

There is nothing in the spec that explicitly states what @encoding should
do,   it is defined as

"Optional attribute specifying an encoding for serialization. The default  
is
"UTF-8".

I interpret this as meaning that the character set of the data being
serialised is defined by @encoding, and that is all, there is no  
requirement
to inform the server of that fact.

A processor that sends the value of @encoding along with the submitted data
is doing some added-value, rather than conformant behaviour.

For Soap, "If the submission mediatype contains a charset MIME parameter,
then it is appended to the application/soap+xml MIME type"

The existence of specified behaviour for SOAP, alongside the absence of
specification regarding @encoding, suggests that mediatype charsets must
win.

So, there are two reasonable options:
a) continue to append @encoding as charset, but check first, to see if
charset already exists.
b) stop appending @encoding as charset altogether.

I think a) is preferable.

Either of these raise the issue that conflict between @encoding, and a
mediatype charset could occur, so a third option is possible:

c) append @encoding as charset, if mediatype and @encoding specify  
different
charsets, ignore mediatype.

However, since @encoding has a default value, pretty much translates as
ignore any charset defined by mediatype, as, if it is wrong, it will be
discarded, and if it is right, it will already be there.

Then again, if form authors really want to do something like send a UTF-8
document but tell the server that it is UTF-16 should implementers or spec
authors really care?

So, I believe that a) is still the preferable option.



-- 

<http://webbackplane.com/paul-butcher>

Received on Thursday, 12 June 2008 14:37:23 UTC