- From: Wolfgang Hoschek <whoschek@lbl.gov>
- Date: Mon, 22 Nov 2004 13:37:24 -0800
- To: Aleksander Slominski <aslom@cs.indiana.edu>
- Cc: public-xml-binary@w3.org, xml-dev@lists.xml.org
Alex, see comments inline below... On Nov 22, 2004, at 1:01 PM, Aleksander Slominski wrote: > Wolfgang Hoschek wrote: > >> This is to announce the nux-1.0beta2 release >> (http://dsd.lbl.gov/nux/). >> >> Nux is a small, straightforward, and surprisingly effective >> open-source extension of the XOM XML library. >> > hi Wolfgang, > > the natural question is: how does it compare to XBIS? Among other things, we also benchmarked with the test xml files that come with XBIS (thanks to Dennis Sonoski for the great work - much appreciated). It would be interesting to directly compare performance with XBIS, but so far we did not do so, for two reasons: - XBIS currently does not work with XOM (misses some XMLReader features/properties that XOM requires) - XBIS measures performance from and to SAX event streams. bnux measure performance from XOM documents to byte arrays, and back. bnux includes XOM tree walking, tree building, and the inherent XOM XML wellformedness checks, which is signifcantly more epensive (and also more useful, since it measure delivering data from/to a large number of real-world applications, rather than low-level SAX apps). In other words, the benchmarking methodology is different. It would not be an apples to apples comparison. Might still be interesting, though. > > can it be divorced from XOM? The concept is applicable to any DOM-like tree model and probably any infoset based model. The implementation is specific to XOM. > > >> Features include: >> • Seamless W3C XQuery and XPath support for XOM, through >> Saxon. >> • Efficient and flexible pools and factories for XQueries, >> XSL Transforms, as well as Builders that validate against various >> schema languages, including W3C XML Schemas, DTDs, RELAX NG, >> Schematron, etc. >> • Serialization and deserialization of XOM XML documents to >> and from an efficient and compact custom binary XML data format >> (bnux format), without loss or change of any information. >> • For simple and complex continuous queries and/or >> transformations over very large or infinitely long XML input, a >> convenient streaming path filter API combines full XQuery support >> with straightforward filtering. >> • Glue for integration with JAXB and for queries over >> ill-formed HTML. >> • Well documented API. Ships in a jar file that weighs just >> 60 KB. >> >> Changelog: >> >> XOM serialization and deserialization performance is more than good >> enough for most purposes. However, for particularly stringent >> performance requirements this release adds "bnux", an option for >> lightning-fast binary XML serialization and deserialization. > > did you compare BNUX and XBIS performance? see above. > >> Contrasting bnux with XOM: >> >> • Serialization speedup: 2-7 (10-35 MB/s vs. 5 MB/s) >> • Deserialization speedup: 4-10 (20-50 MB/s vs. 5 MB/s) >> • XML data compression factor: 1.5 - 4 >> >> For a detailed discussion and background see >> http://dsd.lbl.gov/nux/api/nux/xom/binary/BinaryXMLCodec.html >> > XOM is tree model so how do you do streaming - it by streaming partial > XOM tree construction/deconstruction when you access data (overriding > |endElement()| in |NodeFactory|) and manually keep detach-ing() nodes > or just letting them to be GCed? Currently we do not do streaming. The bnux serialization algorithm is a three-pass batch algorithm, hence buffer-oriented, not stream-oriented. It has a throughput profile with short critical paths, rather than a low latency profile with long critical paths, rendering it ideal for large volumes of small to medium-sized XML documents, and impractical for individual documents that do not fit into main memory. The bnux deserialization algorithm is a single pass algorithm, and could in theory be streamed through a NodeFactory, but the current implementation does not do so. The serialization algorithm could be restructured to be a single pass algorithm at the expense of compression; performance would probably be roughly the same. Turning the single pass algorithm into a chunked streaming algorithm using "pages" would be possible but complicated, probably reducing performance. We have not tried it, tough. > > what are use cases for nux: what do you plan to use it for? The algorithm is primarily intended for tightly coupled high-performance systems exchanging large volumes of XML data over networks, as well as for compact main memory caches and for short-term storage as BLOBs in backend databases or files (e.g. "session" data with limited duration). > > are use cases related to XML Binary Characterization > <http://www.w3.org/TR/xbc-use-cases/>? They might fit into that "diverse" bag-of-things as well... > > i am a bit disappointed that scientific requirements are completely > omitted form XBC use cases - the closest i could find is > http://www.w3.org/TR/xbc-use-cases/#FPenergy but it skips over whole > issue how to transfer array of doubles without changing endianess ... I may be wrong, but conversion of doubles to strings and back seems the main CPU drain here, rather than byte swapping. Try doing this for billions of floats, gulp. Hence one would need to ship arrays of doubles in IEEE floating point representation or native format to avoid string conversions, perhaps most appropriately as an "attachment" according to the various related standards out there. When working with a binary representation, one could also extend DOM-like APIs in somewhat counter-intuitive manners, with subclasses like DoubleArrayText, converting from double to IEEE floating point and back, or similar. > > we did lot of work in past related to XML performance (in Indiana > University and Binghamton) and are very concerned that whatever binary > XML will be characterized/standardized in W3C will be of no much use > for scientific computing and grids ... You would need strong advocates/evangelists, it seems. Regards, Wolfgang.
Received on Tuesday, 23 November 2004 04:04:05 UTC