Re: binary XML API and scientific use cases [Re: [xml-dev] [ANN] nux-1.0beta2 release

The XOM/bnux work is very interesting for several reasons.

Aleksander Slominski wrote:

>
> Wolfgang Hoschek wrote:
>
>> This is to announce the nux-1.0beta2 release (http://dsd.lbl.gov/nux/).
>>
>> Nux is a small, straightforward, and surprisingly effective 
>> open-source extension of the  XOM XML library.
>>
> hi Wolfgang,
>
> the natural question is: how does it compare to XBIS?
>
> can it be divorced from XOM?
>
> in particular LGPL and Apache/BSD are not compatible (it seems nux is 
> under BSD and XOM under LGPL ...).

I don't see why you make this statement.  Something that is LGPL'd can 
be used as a subsystem of any other kind of system, including commercial 
or BSD licensed software.  BSD licensed software can generally be used 
in any circumstance, including commercial integration, with very little 
requirements, mainly consisting of notice and not suing the author.

>>     •     Seamless W3C XQuery and XPath support for XOM, through Saxon.
>>     •     Efficient and flexible pools and factories for XQueries, 
>> XSL Transforms, as well as Builders that validate against various 
>> schema languages, including W3C XML Schemas, DTDs, RELAX NG, 
>> Schematron, etc.
>>     •     Serialization and deserialization of XOM XML documents to 
>> and from  an efficient and compact custom binary XML data format 
>> (bnux format), without loss or change of any information.
>>     •     For simple and complex continuous queries and/or 
>> transformations over very large or infinitely long XML input, a 
>> convenient streaming path filter API combines full XQuery support 
>> with straightforward filtering.
>>     •     Glue for integration with JAXB and for queries over 
>> ill-formed HTML.
>>     •     Well documented API. Ships in a jar file that weighs just 
>> 60 KB.
>>
>> Changelog:
>>
>> XOM serialization and deserialization performance is more than good 
>> enough for most purposes. However, for particularly stringent 
>> performance requirements this release adds "bnux", an option for 
>> lightning-fast binary XML serialization and deserialization. 
>
> Features include:
>
> did you compare BNUX and XBIS performance?
>
>> Contrasting bnux with XOM:
>>
>>     •     Serialization speedup: 2-7 (10-35 MB/s vs. 5 MB/s)
>>     •     Deserialization speedup: 4-10 (20-50 MB/s vs. 5 MB/s)
>>     •     XML data compression factor: 1.5 - 4
>>
>> For a detailed discussion and background see 
>> http://dsd.lbl.gov/nux/api/nux/xom/binary/BinaryXMLCodec.html
>>
> XOM is tree model so how do you do streaming - it by streaming partial 
> XOM tree construction/deconstruction when you access data (overriding 
> |endElement()| in |NodeFactory|) and manually keep detach-ing() nodes 
> or just letting them to be GCed?

My reading indicated that they do not do streaming, only complete object 
handling.  This is common for many processing models as streaming isn't 
indicated in the same way it would be for transformation/printing of 
very long documents.

>
> what are use cases for nux: what do you plan to use it for?
>
> are use cases related to XML Binary Characterization 
> <http://www.w3.org/TR/xbc-use-cases/>?
>
> i am a bit disappointed that scientific requirements are completely 
> omitted form XBC use cases - the closest i could find is 
> http://www.w3.org/TR/xbc-use-cases/#FPenergy but it skips over whole 
> issue how to transfer array of doubles without changing endianess ...

I have proposed to the group recently that I create one or more use 
cases that cover supercomputing, grid processing, and sensor networks.  
Your observation seems to validate that point.  I would be happy to 
incorporate anything you could provide.  My company builds and maintains 
Linux supercomputers and I have present and past experience with 
grid-like processing, so I have some resources and contacts.

> we did lot of work in past related to XML performance (in Indiana 
> University and Binghamton) and are very concerned that whatever binary 
> XML will be characterized/standardized in W3C will be of no much use 
> for scientific computing and grids ...

Could you provide links or details to any of this work?  How do you 
think that XML, espeically a binary characterized XML, should related to 
HDF5?

>
> thanks,
>
> alek
>

sdw

-- 
swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw

Received on Monday, 22 November 2004 22:11:28 UTC