- From: Mark A. Jones <jones@research.att.com>
- Date: Thu, 19 Oct 2000 13:56:36 -0400
- To: xml-dist-app@w3.org
> RE: Issues with Packaging Application Payloads > > From: HUGHES,MARK (Non-HP-FtCollins,ex1) (mark_hughes@non.hp.com) > Date: Wed, Oct 18 2000 > > Then there's #4, the *RIGHT WAY* to do this, which is: > A) Before inserting your arbitrary text into your XML wrapper, run it > through a filter that replaces & with &, < with <, and > with >. > B) Before handing arbitrary text back to the user, run it through a filter > that replaces < with <, > with >, and & with &. > > Voila, the problem is solved. You don't have the byte bloat of base64, > you don't have the limitation of not including ]]> in CDATA, and you don't > have to mess up validation. > > XML is 8-bit clean (through UTF-8/16), so you can even send binary this > way (though admittedly, at 50% bloat for 128-255, as compared to the 33% > bloat of base64). > > It's easy. It's nigh-perfect. Why would anyone NOT do this? > > -- > <a href="http://kuoi.asui.uidaho.edu/~kamikaze/"> Mark Hughes </a> > Basically, the two approaches to packaging are delimiting and byte/character counting. Some protocols send a byte count and then that many bytes. This has the drawback for dynamically generated data that you don't know the bytecount in advance. XML CDATA, MIME's boundary-strings, and SMTP's dot termination are all delimiting examples. Delimiting schemes typically allow you to explicitly escape embedded delimiters (e.g., dot-stuffing in SMTP or the backslash character in many programming languages). Others, like CDATA, force you to concatenate sections to break up the would-be delimiter/terminator. Mark's approach is the typical one taken with XML application payloads in SOAP. It takes care of embedded CDATA delimiters, which become "]]>". Encoding and decoding can be done on the fly for dynamically generated content. It does not solve the byte bloat issue with binary data which still must undergo encoding/decoding. There would be two approaches to using a SAX-style parser: 1) Use two applications (or a recursive application) -- one for the xml protocol processing and a separate one for the application data after the replacements have taken place. There may be distinct advantages to this arrangment in terms of re-establishing an appropriate execution environment. 2) Have the elements that contain the </>/& encodings directly flag the parser's CDATA handler, tokenization and parsing routines to do the decoding and recursive parsing into the CDATA section (as though it weren't quoted) while obeying the CDATA terminator (as though it were quoted). [essentially building in a meta-interpretation feature to the SAX parsing model] With this approach, interpretation could proceed incrementally without having to hit the end of CDATA, do substitutions, and explicitly invoke the XML parser on the decoded content. This approach also allows a more fluid interaction of document features (lexical-scoping, id processing, etc.) between the XP and application data environments, but it isn't clear if this is good or bad. It also would mean revising/extending existing SAX parsers which might be problematic. Mark A. Jones AT&T Labs - Research Shannon Laboratory Room A201 180 Park Ave. Florham Park, NJ 07932-0971 email: jones@research.att.com phone: (973) 360-8326 fax: (973) 360-8970
Received on Thursday, 19 October 2000 13:56:39 UTC