W3C home > Mailing lists > Public > xml-dist-app@w3.org > February 2003

Re: Opaque data, XML, and SOAP

From: John J. Barton <John_Barton@hpl.hp.com>
Date: Fri, 28 Feb 2003 10:01:09 -0800
Message-Id: <>
To: rayw@netscape.com (Ray Whitmer)
Cc: Don Box <dbox@microsoft.com>, xml-dist-app@w3.org


    Sorry I wasn't clear.  I certainly don't believe that XML needs to
do anything about a universal object model.  I was only agreeing with
Box et al that mechanisms to work with a mixture of XML and binary
are not limited to SOAP.

   If I am an application writer and I am writing code to traverse 
through  the
data structure returned  by parsing XML, then I will encounter some
references to non-XML data.  Surely this must be true!  It happens for
HTML all the time, with references to GIF and JPEG.  I don't know how XML's
open nature could be violated by referencing non-XML data.  And if the tools
cannot handle  references to binary then what are they good for?

   We should have a standard that allows a sender to combine XML with some
non-XML data in a way that a receiver can parse the XML and access the
non-XML data.  It shouldn't matter if the XML is SOAP or not.


At 08:44 AM 2/28/2003 -0800, Ray Whitmer wrote:
>John J. Barton wrote:
>>   Your note is nicely written and in one critical regard I agree with 
>> you 100%:
>>binary data is not a SOAP issue but an issue for XML to solve generally.
>Why XML?  That was not the purpose of XML -- to be a universal object 
>model.  XML has the use of URIs to reference things.  XML or its 
>predacessors was never designed to be a universal envelope, there are 
>obvious reasons why it should not be one, and if SOAP designers wanted to 
>design for that, they should have chosen another envelope format that is 
>designed to deal with binary data.
>There is no good reason to place large binary data into the infoset 
>representation, such as DOM.  Placing very small binary data into the 
>infoset, while not a performance concern, still violates the open nature 
>and structure of XML.  It is clear that it serves the purposes of anyone 
>who wants to wrap proprietary binary structure and call it open XML. 
>Referencing from a binary envelope clarifies this seperation, and a binary 
>envelope would have been a better choice for the SOAP envelope for those 
>who want to transmit binary data for a number of reasons which have been 
>expressed repeatedly.  When it was brought up, the argument was that SOAP 
>with attachments was somehow a much superior design for transmitting 
>data.  Big suprise that now we discover it is not and has problems.
>Placing huge binary data into the XML, by breaking the model and 
>associated tools, defeats the stated purposes for using XML in the first 
>place making it unreasonable to use many XML tools and processing models 
>on it.  XML features, standardized tools such as DOM, processing models, 
>and so on are far from ideal for dealing with large binary data that is 
>encapsulated.  If it is not encapsulated, then it is not an XML issue, 
>since URIs tie XML to whatever else it may be.
>It is now not a SOAP issue because the SOAP standard refused to deal with 
>it, limiting it's scope of applicability itself in the process. Once the 
>spec reached W3C under the name of XML Protocol, attempts to fix this 
>fundamental problem at the SOAP envelope level were ruled out of 
>scope.  If the original SOAP designers left it out because they thought it 
>should go in the XML, IMO it is another clear sign that they misunderstood 
>XML and the surrounding tools.
>Thus, we get SOAP with attachments, which adds another layer to drill down 
>through and complicate  the organization, optimization, and perhaps make 
>it difficult to deploy existing tools for SOAP.
>>    Extending XInclude as the way to specify the connection between XML and
>>binary data would be close to SwA.  That is, an XML document using XInclude
>>would look quite a bit like the SwA XML. If extending XInclude makes a
>>specification easier, great, but I don't see that it changes the problem 
>>all that
>>    Specifically you still have to come up with a wire format 
>> standard.  You say that you
>>can have "multiple serialization formats, effectively unifying multiple 
>>messaging technologies",
>>but that cannot be true.  Multiple serialization formats means 
>>fragmentation in
>>an open system, not unification.  W3C has to pick one, not punt.
>I agree with this 100%.  SOAP was the specification that should have 
>picked one, because that is where the deficiency is.  If W3C picks one, it 
>will I hope be for the SOAP specification, and I hope it does not mandate 
>encapsulation inside of XML, which is generally counter to the purposes of 
>markup languages.  W3C had an activity for envelopes, which was 
>abandoned.  You really gain nothing by making the envelope be XML. Even 
>MIME is a much better envelope than XML, and you might instead choose ZIP, 
>etc. depending upon the anticipated needs.
>>    And I cannot understand what possible value one can derive from you 
>> last point:
>>"Finally (and perhaps most importantly), ALL SOAP messages can be 
>>represented in
>>pure text".  What does "pure text" mean?  16 bit Unicode?  So what? Is 
>>this any
>>more significant than saying its "pure binary"? If I attach an image to a 
>>SOAP message
>>and send it to you, what pure-text processing can you do to the mixed 
>If it is 16-bit Unicode (aka UTF-16), then the article was completely 
>wrong when it claimed that it was such an efficient mechanism, because an 
>extra 8 bits are wasted.
>>   I want to reiterate that the application developer will see a unified 
>> DOM no matter
>>what is done with XML/binary mixing at the messaging layer. The API they use
>>will have do deal with binary because the bits are not text.
>The proper model which exposes this will need to be a unified Web Message 
>Object Model.  It is not a Document Object Model, or even a SOAP Object 
>Model, because it goes beyond where these two specifications define 
>themselves to stop -- at the infoset, just as a SOAP with Attachments 
>model based on MIME or DIME goes way beyond the infoset.
>Ray Whitmer

John J. Barton          email:  John_Barton@hpl.hp.com
MS 1U-17  Hewlett-Packard Labs
1501 Page Mill Road              phone: (650)-236-2888
Palo Alto CA  94304-1126         FAX:   (650)-857-5100
Received on Friday, 28 February 2003 13:01:24 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:01:22 UTC