Re: Opaque data, XML, and SOAP

On Mon, 10 Mar 2003 13:19:10 -0800
"John J. Barton" <John_Barton@hpl.hp.com> wrote:
>     Whether a stream or DOM parser is used, the important
> question is whether the receiver can predict the size of the
> binary chunk without buffering it.  Furthermore one needs to
> insure that the XML preceding the binary chunk has sufficient
> information to give intelligible error feedback should processing
> the binary fail.  Both of these are straightforward when the control
> information (XML) completely precedes any binary as in the MIME
> packaging.  I suppose that ordering could be introduced into XML
> but I wonder if the complexity one adds by doing so is more or
> less than a non-XML based solution.

<xs:schema>
  <xs:complexType name="mimeType">
    <xs:simpleContent>
      <xs:extension base="xs:base64Binary">
        <xs:attribute name="content-length" type="integer" use="required" />
        <xs:attribute name="content-type" type="mimeContentType" use="required" />
        <xs:anyAttribute namespace="##other" processContents="lax" />
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>

  <xs:simpleType name="mimeContentType">
    <xs:restriction base="xs:string">
      <xs:pattern value="(text|image|application|audio|video|model|x-.+)/.+" />
    </xs:restriction>
  </xs:simpleType>
</xs:schema>

Doesn't allow composite types, of course.  Or make it simpler still, with .+/.+, but new types are (by design) rare, while sub-types are relatively easy.  Add more headers with namespaced attributes.  This makes processing pretty straightforward (more so than plain base64, because this provides both a content type and a content length).  If it thrills you, you then get this:

<xs:complexType name="mimeCompositeType">
  <xs:complexContent>
    <xs:choice minOccurs="1" maxOccurs="unbounded">
      <xs:element name="simplePart" type="mimeType" />
      <xs:element name="complexPart" type="mimeCompositeType" />
    </xs:choice>
    <xs:attribute name="content-type" type="mimeCompositeContentType" use="required" />
  </xs:complexContent>
</xs:complexType>

<xs:simpleType name="mimeCompositeContentType">
  <xs:restriction base="xs:string">
    <xs:pattern value="(multipart|message)/.+)" />
  </xs:restriction>
</xs:simpleType>

It doesn't do a completely outstanding job of verifying content types, but it's close enough for gummint work, eh?  And could prolly be improved without great effort.

Which gives you MIME messages in XML envelopes.  Just a <snicker /> in the global namespace ....

Amy!

> 
>    As for the efficiency of XML+binary vs XML+base64, the tradeoffs
> vary with the use cases one imagines.
> 
> If the control information and data are similar in size, I agree with your
> comments below.  In fact I would argue for either all binary or all XML for
> such cases so one gets either performance or simple tools without
> compromising the other value.
> 
> But if the control information is much smaller than the data, eg 20k vs 2MB
> (or 2GB) then the size of the control information is not important but 
> expanding
> the binary has a direct impact.  It is for this case that mixing XML and
> binary has value, eg for image, audio, and video transfers.
> 
> So base64 has limited value: I would rather encode into ASCII for simplicity or
> solve the XML+binary problem for large data sets.
> 
> John.
> 
> 
> At 04:01 PM 3/8/2003 -0800, Don Box wrote:
> 
> > > -----Original Message-----
> > > From: John J. Barton [mailto:John_Barton@hpl.hp.com]
> > > Sent: Friday, March 07, 2003 1:59 PM
> > > To: Anne Thomas Manes; Don Box; xml-dist-app@w3.org
> > >
> > > At 12:29 PM 3/7/2003 -0500, Anne Thomas Manes wrote:
> > >
> > > >I certainly prefer the idea of using XInclude better than SwA or
> > > >WS-Attachments. But why not just use base64? It's really the better
> >way
> > > to
> > > >go.
> > >
> > > Inline binary and base64 is fine for short objects in general purpose
> > > machines.
> > > But I think that a lot of XML+binary uses will be small XML messages
> > > containing
> > > instructions and large binary data objects sent from or to special
> >purpose
> > > machines.
> > >
> > > Using inline storage for the binary has many drawbacks.  Its not
> >robust to
> > > partial
> > > transmission: you don't have any usable information until all the
> >binary
> > > has been
> > > transmitted and the end of the XML has been reached.
> >
> >That may be true for DOM users, but for developers using streaming XML
> >parsers (e.g., Xerces/SAX, System.Xml.XmlTextReader) this isn't an
> >issue.
> >
> >Also, receivers that want to guard against malformed messages before
> >processing will need to buffer the entire message no matter which
> >approach is used, as such an app needs to guard against a MIME part will
> >be malformed just as it needs to guard against a missing XML end tag.
> >
> >
> > > It is not easy for
> > > limited
> > > memory devices: you must buffer the input, count it, then reallocate
> >to
> > > process
> > > the binary.
> >
> >Again, this is if one uses the DOM. Streaming XML parsers are the norm
> >nowadays.
> >
> > > It requires more complex sending software to embed the
> > > binary:
> > > you
> > > probably will shuffle the bits through application  layer code for no
> > > purpose.
> >
> >Experience with Apache AXIS and .NET Framework hasn't borne this out.
> >It's pretty easy to send an opaque blob in either stack - they both
> >handle the base64 automatically (and interoperably).
> >
> > > The main argument against base64 is the pointless 30% increase in
> >bits.
> >
> >Is that 30% increase any more or less pointless than the 100%+ increase
> >in bits one pays for using XML 1.0 instead of a binary serialization
> >format?
> >
> > > The
> > > CPU cost of encoding is also pointless but then the processor is
> >mostly
> > > idle
> > > anyway.
> >
> >And again, is the CPU cost of base64 encoding more or less pointless
> >than the CPU cost of running an XML 1.0 parser?
> >
> > > Consider for example a camera sending an image.  The XML will be a few
> > > kbytes
> > > and might be fixed in ROM; the binary  will be a few Mbytes.  If the
> > > receiver is a
> > > printer inline/base64 vs outline/jpeg could be the difference between
> > > success and failure.
> >
> >This assumes one buffers everything in memory. Is that typically the
> >architecture used when developing for a limited memory device?
> >
> > > And here the costs are all in the design: we know that outlined/binary
> > > solutions are
> > > feasible and efficient.  We just need to pick one.
> >
> >I hope that our paper had made clear that abandoning the infoset as the
> >data model for messages has considerable design costs - looking at this
> >problem in the isolation of a handful of SOAP stacks sans WSDL support
> >is myopic in my opinion.
> >
> > > And of course each individual developer does not care about 30%
> >increase
> > > or
> > > doubled
> > > memory requirements.  But systems designers have to consider the
> >aggregate
> > > impact
> > > of poor protocol decisions.
> >
> >There are those who argue that the choice of the Infoset and/or XML 1.0
> >was a poor protocol decision. However, that's what we've got.
> >
> >DB
> 
> ______________________________________________________
> John J. Barton          email:  John_Barton@hpl.hp.com
> http://www.hpl.hp.com/personal/John_Barton/index.htm
> MS 1U-17  Hewlett-Packard Labs
> 1501 Page Mill Road              phone: (650)-236-2888
> Palo Alto CA  94304-1126         FAX:   (650)-857-5100
> 


-- 
Amelia A. Lewis
Architect, TIBCO/Extensibility, Inc.
alewis@tibco.com

Received on Monday, 10 March 2003 16:45:23 UTC