RE: Issues with Packaging Application Payloads from Laird A Popkin on 2000-10-18 (xml-dist-app@w3.org from October 2000)

From: Laird A Popkin <laird@io.com>
Date: Wed, 18 Oct 2000 08:21:37 -0500 (CDT)
To: James Snell <jsnell@lemoorenet.com>
cc: xml-dist-app@w3.org
Message-ID: <Pine.LNX.4.21.0010180753060.12857-100000@fnord.io.com>
It seems to me that the problem with transporting arbitrary XML content in
XML messages is that there is no way to easily encapsulate arbitrary XML
within XML and validate it.

The strategies I can think of are:

1) Include the "body" as arbitrary data, terminated by ]]>. This has the
problem that the body cannot contain the termination string because the
termination does not nest in XML. Thus, if I try to send a message that
contains ]]> I terminate the processing of the body early and treat the
remainder of the body as if it were part of the wrapper, which will of
course not be valid. The only fix for this that I can see is changing the
definition of XML's handling of the ]]> terminator to nest, which would
make parsing a bit trickier. That being said, this strategy allows the
body to have its own DTD, since it's an independent XML document from the
wrapper document. This means that the receiving application would need to
spawn a new parser to process the contained message, which (IMO) is a nice
separation of logical layers. Of course, the message would need to contain
enough information to tell the application which data handler (XML or not,
and if XML which event handler to use).

2) The other strategy is to base64 encode the contained message, which
guarantees that it cannot contain the termination string (]]>). To process
this, the receiving parser would hand the entire lump of text to the
application, which would decode the string and apawn a new parser to
process it. As above, the application would need to have enough
information to decide which handler to apply to the data. This works
pretty well, aside from the overhead of encoding/decoding, and the
aesthetic issue of making the XML messages unreadably by humans.

3) Include the "body" as content type ANY, so it has to be well formed but
can't be validated, and can't contain arbitrary data due to nested
terminators (as in case 1).

Personally, I think the best answer would be for XML to nest terminators,
but getting that changed is, at best, a long shot. And being unable to
validate content is unaccceptable (IMO), so that leaves base64
encoding. This is what we chose on the ICE Authoring Group, along with the
option of including content by reference in order to avoid encoding (for
large binary objects accessable via HTTP or FTP).

On Tue, 17 Oct 2000, James Snell wrote:
... snip ...
 
> Encoding XML is another issue, obviously... SOAP declares that the envelope
> can be used to encapsulate arbitrary namespace qualified XML but does not
> declare exactly how to go about making sure that that "arbitrary XML" is
> valid in any way.

> Another question that I must ask:  is it the intention of this working group
> to not only define the PACKAGING structure of the XML Protocol but also a
> standard API for implementing that PACKAGING as has been done with XML and
> the DOM?  Or is this working group only going to focus on the XML Packaging?

Good scope question. It's easier to focus on the data structure, but the
advantages of a standard API, a la SAX and DOM, are pretty obvious. My
instinct would be to focus on the message format first, and then once
that's well understood, perhaps with some implementations running, we'd be
in a position to distill a standard API. IMO, of course.

> - James
> 
> 
>   -----Original Message-----
>   From: xml-dist-app-request@w3.org [mailto:xml-dist-app-request@w3.org]On
> Behalf Of Andrew Layman
>   Sent: Tuesday, October 17, 2000 6:03 PM
>   To: xml-dist-app@w3.org
>   Subject: RE: Issues with Packaging Application Payloads
> 
> 
>   Thanks for the thoughtful mail.  I have some ideas on some of the points
> you raise, though I don't suggest that these are exhaustive.
> 
>   3.    If I understand you, the perfomance problem you cite is more an
> issue of the suitability of a DOM-based message handler for large messages.
> Whether SOAP, or some other use of XML, or even if using MIME or something
> else, if an application processes large messages by first parsing them into
> a tree or other buffer and then examining the contents, that will be more
> expensive than some more streamlined techniques.

I agree -- the advantages in event-based programming models for processing
large volumes of data (e.g. SAX) are pretty obvious. IMO, in addition to
being more efficient, event-based models are more robust and easier to
implement. (And I've implemented several XML protocol stacks over the last
few years).

>   I think that one of the themes running through the above comments is that
> I'm trying to separate the issues that relate to the protocol from other
> issues that relate to good application or API design.  The protocol spec is
> more like a spec for XMLthan a spec for a browser.  There are better and
> poorer browsers, and many differences of taste, and I expect that there will
> be better and poorer application support libraries.

I agree completely.
 
>   By the way, thank you for the kind compliments on SOAP.
Received on Wednesday, 18 October 2000 09:21:40 UTC