- From: Laird A Popkin <laird@io.com>
- Date: Wed, 18 Oct 2000 08:21:37 -0500 (CDT)
- To: James Snell <jsnell@lemoorenet.com>
- cc: xml-dist-app@w3.org
It seems to me that the problem with transporting arbitrary XML content in XML messages is that there is no way to easily encapsulate arbitrary XML within XML and validate it. The strategies I can think of are: 1) Include the "body" as arbitrary data, terminated by ]]>. This has the problem that the body cannot contain the termination string because the termination does not nest in XML. Thus, if I try to send a message that contains ]]> I terminate the processing of the body early and treat the remainder of the body as if it were part of the wrapper, which will of course not be valid. The only fix for this that I can see is changing the definition of XML's handling of the ]]> terminator to nest, which would make parsing a bit trickier. That being said, this strategy allows the body to have its own DTD, since it's an independent XML document from the wrapper document. This means that the receiving application would need to spawn a new parser to process the contained message, which (IMO) is a nice separation of logical layers. Of course, the message would need to contain enough information to tell the application which data handler (XML or not, and if XML which event handler to use). 2) The other strategy is to base64 encode the contained message, which guarantees that it cannot contain the termination string (]]>). To process this, the receiving parser would hand the entire lump of text to the application, which would decode the string and apawn a new parser to process it. As above, the application would need to have enough information to decide which handler to apply to the data. This works pretty well, aside from the overhead of encoding/decoding, and the aesthetic issue of making the XML messages unreadably by humans. 3) Include the "body" as content type ANY, so it has to be well formed but can't be validated, and can't contain arbitrary data due to nested terminators (as in case 1). Personally, I think the best answer would be for XML to nest terminators, but getting that changed is, at best, a long shot. And being unable to validate content is unaccceptable (IMO), so that leaves base64 encoding. This is what we chose on the ICE Authoring Group, along with the option of including content by reference in order to avoid encoding (for large binary objects accessable via HTTP or FTP). On Tue, 17 Oct 2000, James Snell wrote: ... snip ... > Encoding XML is another issue, obviously... SOAP declares that the envelope > can be used to encapsulate arbitrary namespace qualified XML but does not > declare exactly how to go about making sure that that "arbitrary XML" is > valid in any way. > Another question that I must ask: is it the intention of this working group > to not only define the PACKAGING structure of the XML Protocol but also a > standard API for implementing that PACKAGING as has been done with XML and > the DOM? Or is this working group only going to focus on the XML Packaging? Good scope question. It's easier to focus on the data structure, but the advantages of a standard API, a la SAX and DOM, are pretty obvious. My instinct would be to focus on the message format first, and then once that's well understood, perhaps with some implementations running, we'd be in a position to distill a standard API. IMO, of course. > - James > > > -----Original Message----- > From: xml-dist-app-request@w3.org [mailto:xml-dist-app-request@w3.org]On > Behalf Of Andrew Layman > Sent: Tuesday, October 17, 2000 6:03 PM > To: xml-dist-app@w3.org > Subject: RE: Issues with Packaging Application Payloads > > > Thanks for the thoughtful mail. I have some ideas on some of the points > you raise, though I don't suggest that these are exhaustive. > > 3. If I understand you, the perfomance problem you cite is more an > issue of the suitability of a DOM-based message handler for large messages. > Whether SOAP, or some other use of XML, or even if using MIME or something > else, if an application processes large messages by first parsing them into > a tree or other buffer and then examining the contents, that will be more > expensive than some more streamlined techniques. I agree -- the advantages in event-based programming models for processing large volumes of data (e.g. SAX) are pretty obvious. IMO, in addition to being more efficient, event-based models are more robust and easier to implement. (And I've implemented several XML protocol stacks over the last few years). > I think that one of the themes running through the above comments is that > I'm trying to separate the issues that relate to the protocol from other > issues that relate to good application or API design. The protocol spec is > more like a spec for XMLthan a spec for a browser. There are better and > poorer browsers, and many differences of taste, and I expect that there will > be better and poorer application support libraries. I agree completely. > By the way, thank you for the kind compliments on SOAP.
Received on Wednesday, 18 October 2000 09:21:40 UTC