W3C home > Mailing lists > Public > xml-dist-app@w3.org > March 2003

RE: Opaque data, XML, and SOAP

From: Don Box <dbox@microsoft.com>
Date: Sat, 8 Mar 2003 16:01:17 -0800
Message-ID: <57EF69AF56D92148984EDA31740829450486D86E@RED-MSG-10.redmond.corp.microsoft.com>
To: "John J. Barton" <John_Barton@hpl.hp.com>, "Anne Thomas Manes" <anne@manes.net>, <xml-dist-app@w3.org>

> -----Original Message-----
> From: John J. Barton [mailto:John_Barton@hpl.hp.com]
> Sent: Friday, March 07, 2003 1:59 PM
> To: Anne Thomas Manes; Don Box; xml-dist-app@w3.org
> 
> At 12:29 PM 3/7/2003 -0500, Anne Thomas Manes wrote:
> 
> >I certainly prefer the idea of using XInclude better than SwA or
> >WS-Attachments. But why not just use base64? It's really the better
way
> to
> >go.
> 
> Inline binary and base64 is fine for short objects in general purpose
> machines.
> But I think that a lot of XML+binary uses will be small XML messages
> containing
> instructions and large binary data objects sent from or to special
purpose
> machines.
> 
> Using inline storage for the binary has many drawbacks.  Its not
robust to
> partial
> transmission: you don't have any usable information until all the
binary
> has been
> transmitted and the end of the XML has been reached.  

That may be true for DOM users, but for developers using streaming XML
parsers (e.g., Xerces/SAX, System.Xml.XmlTextReader) this isn't an
issue. 

Also, receivers that want to guard against malformed messages before
processing will need to buffer the entire message no matter which
approach is used, as such an app needs to guard against a MIME part will
be malformed just as it needs to guard against a missing XML end tag.  


> It is not easy for
> limited
> memory devices: you must buffer the input, count it, then reallocate
to
> process
> the binary.  

Again, this is if one uses the DOM. Streaming XML parsers are the norm
nowadays.

> It requires more complex sending software to embed the
> binary:
> you
> probably will shuffle the bits through application  layer code for no
> purpose.

Experience with Apache AXIS and .NET Framework hasn't borne this out.
It's pretty easy to send an opaque blob in either stack - they both
handle the base64 automatically (and interoperably).

> The main argument against base64 is the pointless 30% increase in
bits.

Is that 30% increase any more or less pointless than the 100%+ increase
in bits one pays for using XML 1.0 instead of a binary serialization
format?

> The
> CPU cost of encoding is also pointless but then the processor is
mostly
> idle
> anyway.

And again, is the CPU cost of base64 encoding more or less pointless
than the CPU cost of running an XML 1.0 parser?

> Consider for example a camera sending an image.  The XML will be a few
> kbytes
> and might be fixed in ROM; the binary  will be a few Mbytes.  If the
> receiver is a
> printer inline/base64 vs outline/jpeg could be the difference between
> success and failure.

This assumes one buffers everything in memory. Is that typically the
architecture used when developing for a limited memory device? 

> And here the costs are all in the design: we know that outlined/binary
> solutions are
> feasible and efficient.  We just need to pick one.

I hope that our paper had made clear that abandoning the infoset as the
data model for messages has considerable design costs - looking at this
problem in the isolation of a handful of SOAP stacks sans WSDL support
is myopic in my opinion.

> And of course each individual developer does not care about 30%
increase
> or
> doubled
> memory requirements.  But systems designers have to consider the
aggregate
> impact
> of poor protocol decisions.

There are those who argue that the choice of the Infoset and/or XML 1.0
was a poor protocol decision. However, that's what we've got. 

DB
Received on Saturday, 8 March 2003 19:01:24 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:59:13 GMT