RE: Opaque data, XML, and SOAP

Don,

    Whether a stream or DOM parser is used, the important
question is whether the receiver can predict the size of the
binary chunk without buffering it.  Furthermore one needs to
insure that the XML preceding the binary chunk has sufficient
information to give intelligible error feedback should processing
the binary fail.  Both of these are straightforward when the control
information (XML) completely precedes any binary as in the MIME
packaging.  I suppose that ordering could be introduced into XML
but I wonder if the complexity one adds by doing so is more or
less than a non-XML based solution.

   As for the efficiency of XML+binary vs XML+base64, the tradeoffs
vary with the use cases one imagines.

If the control information and data are similar in size, I agree with your
comments below.  In fact I would argue for either all binary or all XML for
such cases so one gets either performance or simple tools without
compromising the other value.

But if the control information is much smaller than the data, eg 20k vs 2MB
(or 2GB) then the size of the control information is not important but 
expanding
the binary has a direct impact.  It is for this case that mixing XML and
binary has value, eg for image, audio, and video transfers.

So base64 has limited value: I would rather encode into ASCII for simplicity or
solve the XML+binary problem for large data sets.

John.


At 04:01 PM 3/8/2003 -0800, Don Box wrote:

> > -----Original Message-----
> > From: John J. Barton [mailto:John_Barton@hpl.hp.com]
> > Sent: Friday, March 07, 2003 1:59 PM
> > To: Anne Thomas Manes; Don Box; xml-dist-app@w3.org
> >
> > At 12:29 PM 3/7/2003 -0500, Anne Thomas Manes wrote:
> >
> > >I certainly prefer the idea of using XInclude better than SwA or
> > >WS-Attachments. But why not just use base64? It's really the better
>way
> > to
> > >go.
> >
> > Inline binary and base64 is fine for short objects in general purpose
> > machines.
> > But I think that a lot of XML+binary uses will be small XML messages
> > containing
> > instructions and large binary data objects sent from or to special
>purpose
> > machines.
> >
> > Using inline storage for the binary has many drawbacks.  Its not
>robust to
> > partial
> > transmission: you don't have any usable information until all the
>binary
> > has been
> > transmitted and the end of the XML has been reached.
>
>That may be true for DOM users, but for developers using streaming XML
>parsers (e.g., Xerces/SAX, System.Xml.XmlTextReader) this isn't an
>issue.
>
>Also, receivers that want to guard against malformed messages before
>processing will need to buffer the entire message no matter which
>approach is used, as such an app needs to guard against a MIME part will
>be malformed just as it needs to guard against a missing XML end tag.
>
>
> > It is not easy for
> > limited
> > memory devices: you must buffer the input, count it, then reallocate
>to
> > process
> > the binary.
>
>Again, this is if one uses the DOM. Streaming XML parsers are the norm
>nowadays.
>
> > It requires more complex sending software to embed the
> > binary:
> > you
> > probably will shuffle the bits through application  layer code for no
> > purpose.
>
>Experience with Apache AXIS and .NET Framework hasn't borne this out.
>It's pretty easy to send an opaque blob in either stack - they both
>handle the base64 automatically (and interoperably).
>
> > The main argument against base64 is the pointless 30% increase in
>bits.
>
>Is that 30% increase any more or less pointless than the 100%+ increase
>in bits one pays for using XML 1.0 instead of a binary serialization
>format?
>
> > The
> > CPU cost of encoding is also pointless but then the processor is
>mostly
> > idle
> > anyway.
>
>And again, is the CPU cost of base64 encoding more or less pointless
>than the CPU cost of running an XML 1.0 parser?
>
> > Consider for example a camera sending an image.  The XML will be a few
> > kbytes
> > and might be fixed in ROM; the binary  will be a few Mbytes.  If the
> > receiver is a
> > printer inline/base64 vs outline/jpeg could be the difference between
> > success and failure.
>
>This assumes one buffers everything in memory. Is that typically the
>architecture used when developing for a limited memory device?
>
> > And here the costs are all in the design: we know that outlined/binary
> > solutions are
> > feasible and efficient.  We just need to pick one.
>
>I hope that our paper had made clear that abandoning the infoset as the
>data model for messages has considerable design costs - looking at this
>problem in the isolation of a handful of SOAP stacks sans WSDL support
>is myopic in my opinion.
>
> > And of course each individual developer does not care about 30%
>increase
> > or
> > doubled
> > memory requirements.  But systems designers have to consider the
>aggregate
> > impact
> > of poor protocol decisions.
>
>There are those who argue that the choice of the Infoset and/or XML 1.0
>was a poor protocol decision. However, that's what we've got.
>
>DB

______________________________________________________
John J. Barton          email:  John_Barton@hpl.hp.com
http://www.hpl.hp.com/personal/John_Barton/index.htm
MS 1U-17  Hewlett-Packard Labs
1501 Page Mill Road              phone: (650)-236-2888
Palo Alto CA  94304-1126         FAX:   (650)-857-5100

Received on Monday, 10 March 2003 16:19:26 UTC