W3C home > Mailing lists > Public > xml-dist-app@w3.org > March 2003

RE: Opaque data, XML, and SOAP

From: Don Box <dbox@microsoft.com>
Date: Fri, 28 Feb 2003 22:37:20 -0800
Message-ID: <1057787630.IAA22192@phantom.w3.org>
To: "George Datuashvili" <gdatuashvili@siebel.com>, <xml-dist-app@w3.org>

I agree that the current state of the practice of web service stacks makes ANY large message processing difficult. Having parts of a message exposed as byte [ ] instead of java.io.Stream/System.IO.Stream boxes people into some pretty nasty architectural corners. However, I think that's true whether or not the 2GB octet sequence appears before or after the final end tag.
The problems you point out are all due to choices made by particular SOAP stacks. We were certainly guilty of this "load the entire message and then start processing" model in V1 of .NET.  I think we're on track (as are other developers in this space) for moving to a more scalable "cached header/streamed body" approach that the SOAP processing model more or less implies.
Due to the nature of incremental deployment over the Internet, we wind up living with wire protocols roughly 2-10 times longer than the intial implementation techniques. For that reason, I'd hate to see sub-optimal techniques used in early implementations cause us to live with the wrong message model for 10-20 years. I for one was hoping that SOAP/1.2 would be solid enough to last that long.


From: George Datuashvili [mailto:gdatuashvili@siebel.com]
Sent: Fri 2/28/2003 6:09 PM
To: xml-dist-app@w3.org
Subject: RE: Opaque data, XML, and SOAP

In my past experince most of the real problems were sprouting not from
the 1/3 inflate rate of base64 encoding. Sure, extra data transfer is
not ideal, but it doesn't kill scalability of architecture in long run.

Neither did real problems come from the fact that encoding/decoding is
slower than bitwise memory copy. Sure it is slow, but if the processing
code is not just passing data through bitwise copy, but doing actually
something with it (such is server building .jpg on the fly, or
loading/saving .doc into filesystem) then overhead of encode/decode is
typically pretty small to whatever else is being done with data at the

The real problems typically came from lack of good streaming and
chunking support in web service invocation frameworks. I think SwA was
initially successful because implementations typically had good
streaming support. For example if I have following element in soap body:

<ReportCustomerProblem xmlns='http://company.com/support'>
  <Comment>Full system dump from recent meltdown</Comment>

Last thing I want to see is following service interface:

        void ReportCustomerProblem (String customer, String comment,
Byte[] blob)

Yet this is what I get with popular (all?) web services frameworks if
Blob is base64Binary. Not only this will choke server, but it might
outright fail to work as system dump size can easily approach
addressable space. To me the problem of base64 encoding space/time
ovehead is microscopic in comparison to lack of good streaming
frameworks. So instead I would love get stream that can incrementally
pull data off the wire as requested from ReadBytes() method:

        void ReportCustomerProblem (String customer, String comment,
NiceReadOnlyByteStream blob)

SOAP-Attachments implementations kind of approach this model, since
stream can typically be obtained from magic request contexts. Although
attachment models definitely have many problems described in article
(for example system dumps will need to be included in the encryption).

Fixing SwA problems by embedding external data in infoset one way or
another doesn't really solve the problem of streaming invocation. In
fact there seems to be two big problems for vendors:

1. There is no developer-friendly "strongly-typed" model that would
combine message deserialization with inline endpoint action processing.

2. New developments of web services specifications might require
frameworks and actors to parse full soap envelopes before actual service
implementations get any chance to start request processing.

As long as those two problems exist, users will probably prefer to use
out-of-infoset data passing.


-----Original Message-----
From: Don Box [mailto:dbox@microsoft.com]
Sent: Wednesday, February 26, 2003 10:46 PM
To: xml-dist-app@w3.org
Subject: Opaque data, XML, and SOAP

A few of us have spent some time thinking about the problem space and
wrote the down our thoughts in this area:


Received on Saturday, 1 March 2003 01:37:53 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:11:55 UTC