W3C home > Mailing lists > Public > xml-dist-app@w3.org > February 2003

Re: Opaque data, XML, and SOAP

From: Ray Whitmer <rayw@netscape.com>
Date: Fri, 28 Feb 2003 08:44:37 -0800
Message-ID: <3E5F91F5.90408@netscape.com>
To: "John J. Barton" <John_Barton@hpl.hp.com>
CC: Don Box <dbox@microsoft.com>, xml-dist-app@w3.org

John J. Barton wrote:

> Don,
>   Your note is nicely written and in one critical regard I agree with 
> you 100%:
> binary data is not a SOAP issue but an issue for XML to solve generally.

Why XML?  That was not the purpose of XML -- to be a universal object 
model.  XML has the use of URIs to reference things.  XML or its 
predacessors was never designed to be a universal envelope, there are 
obvious reasons why it should not be one, and if SOAP designers wanted 
to design for that, they should have chosen another envelope format that 
is designed to deal with binary data.

There is no good reason to place large binary data into the infoset 
representation, such as DOM.  Placing very small binary data into the 
infoset, while not a performance concern, still violates the open nature 
and structure of XML.  It is clear that it serves the purposes of anyone 
who wants to wrap proprietary binary structure and call it open XML. 
 Referencing from a binary envelope clarifies this seperation, and a 
binary envelope would have been a better choice for the SOAP envelope 
for those who want to transmit binary data for a number of reasons which 
have been expressed repeatedly.  When it was brought up, the argument 
was that SOAP with attachments was somehow a much superior design for 
transmitting data.  Big suprise that now we discover it is not and has 

Placing huge binary data into the XML, by breaking the model and 
associated tools, defeats the stated purposes for using XML in the first 
place making it unreasonable to use many XML tools and processing models 
on it.  XML features, standardized tools such as DOM, processing models, 
and so on are far from ideal for dealing with large binary data that is 
encapsulated.  If it is not encapsulated, then it is not an XML issue, 
since URIs tie XML to whatever else it may be.

It is now not a SOAP issue because the SOAP standard refused to deal 
with it, limiting it's scope of applicability itself in the process. 
 Once the spec reached W3C under the name of XML Protocol, attempts to 
fix this fundamental problem at the SOAP envelope level were ruled out 
of scope.  If the original SOAP designers left it out because they 
thought it should go in the XML, IMO it is another clear sign that they 
misunderstood XML and the surrounding tools.

Thus, we get SOAP with attachments, which adds another layer to drill 
down through and complicate  the organization, optimization, and perhaps 
make it difficult to deploy existing tools for SOAP.

>    Extending XInclude as the way to specify the connection between XML and
> binary data would be close to SwA.  That is, an XML document using 
> XInclude
> would look quite a bit like the SwA XML. If extending XInclude makes a
> specification easier, great, but I don't see that it changes the 
> problem all that
> much.
>    Specifically you still have to come up with a wire format 
> standard.  You say that you
> can have "multiple serialization formats, effectively unifying 
> multiple messaging technologies",
> but that cannot be true.  Multiple serialization formats means 
> fragmentation in
> an open system, not unification.  W3C has to pick one, not punt.

I agree with this 100%.  SOAP was the specification that should have 
picked one, because that is where the deficiency is.  If W3C picks one, 
it will I hope be for the SOAP specification, and I hope it does not 
mandate encapsulation inside of XML, which is generally counter to the 
purposes of markup languages.  W3C had an activity for envelopes, which 
was abandoned.  You really gain nothing by making the envelope be XML. 
 Even MIME is a much better envelope than XML, and you might instead 
choose ZIP, etc. depending upon the anticipated needs.

>    And I cannot understand what possible value one can derive from you 
> last point:
> "Finally (and perhaps most importantly), ALL SOAP messages can be 
> represented in
> pure text".  What does "pure text" mean?  16 bit Unicode?  So what? Is 
> this any
> more significant than saying its "pure binary"? If I attach an image 
> to a SOAP message
> and send it to you, what pure-text processing can you do to the mixed 
> message?

If it is 16-bit Unicode (aka UTF-16), then the article was completely 
wrong when it claimed that it was such an efficient mechanism, because 
an extra 8 bits are wasted.

>   I want to reiterate that the application developer will see a 
> unified DOM no matter
> what is done with XML/binary mixing at the messaging layer. The API 
> they use
> will have do deal with binary because the bits are not text.

The proper model which exposes this will need to be a unified Web 
Message Object Model.  It is not a Document Object Model, or even a SOAP 
Object Model, because it goes beyond where these two specifications 
define themselves to stop -- at the infoset, just as a SOAP with 
Attachments model based on MIME or DIME goes way beyond the infoset.

Ray Whitmer
Received on Friday, 28 February 2003 11:44:11 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:01:22 UTC