Re: [xml-dev] Call for fast XML implementations from Jeff Rafter on 2006-03-26 (public-exi@w3.org from March 2006)

From: Jeff Rafter <lists@jeffrafter.com>
Date: Sat, 25 Mar 2006 22:35:04 -0800
To: Robin Berjon <robin.berjon@expway.fr>
CC: XML Developers List <xml-dev@lists.xml.org>, W3C EXI Public <public-exi@w3.org>
Message-ID: <44263618.9030404@jeffrafter.com>
I think we need a bit more in the area of use cases. XML Parsers can 
clearly be tweaked within the realm of conformance, without being 
incredibly useful. For instance, if you are just looking for a 
well-formedness test you can get a pretty fast parser. If you are 
looking for it to report anything back to you that will slow it down. 
Reporting methodology is important. Under SAX there is a large degree of 
work that must be done before the information is passed to the 
application-- oftentimes this work is conjoined into the lexer. If there 
was no anticipation of utilizing various features then the reporting 
facilities could be removed. Additionally, *fast* is clearly relative. 
If you are only parsing XML-TS files then in general any parser that has 
name caching or name hashes will perform more slowly than if they were 
parsing a 12 gig database to XML dump of repeating record and field 
names. Likewise, XML parsers that maintain useful error information 
(e.g. position) will be slower than less-informative models.

Of course, this synopsis is less than empirical... more results from 
parsers I have tested in the trenches.

As an aside, are your tests focused only on UTF-8/16?

Cheers,
Jeff Rafter

Robin Berjon wrote:
> Dear all,
> 
> it's an interesting coincidence that at the same time that the EXI WG 
> was busy drafting this very email xml-dev was discussing faster XML 
> parsers. We're sorry to say that we don't have money to offer, though we 
> can offer warm fuzzies and the chance to get your implementation before 
> the eyes of XML-related decision-makers (or at least influencers) from 
> quite a fair number of companies.
> 
> As you may know, the Efficient XML Interchange WG has been busy building 
> up a framework for the purpose of measuring various aspects of efficient 
> XML formats, with the goal not only of comparing efficient formats with 
> one another but also of comparing them to XML in order to demonstrate 
> the need for such a format with the nitty-gritty of detail that has been 
> requested of us. It is notably of interest because these measurements 
> will constitute a go/no-go point such that if the efficient formats are 
> not efficient enough, the EXI WG would be closed down without producing 
> an efficient XML format.
> 
> Needless to say, performing this comparison against a set of sluggish 
> XML parsers out there, of which there is no shortage, would hardly prove 
> satisfying. We therefore plan to use the fastest XML parsers that we can 
> lay our hands on, and this is where you can help us. While the WG does 
> have quite a fair bit of experience looking for faster XML parsers, we 
> do not claim to have perfect, all-encompassing knowledge about the 
> options that may be available and that we might have overlooked in the 
> past few years. Furthermore, a non-negligible number of the XML parsers 
> that are pitched as faster than the rest achieve these levels of 
> performance by cutting corners, generally making them non-conformant — 
> something which we cannot consider to be a valid approach.
> 
> Therefore we solicit input from the community regarding fast XML 
> parsers. Which one(s) would you pick if you had to tear through XML 
> documents at warp speed? What is your experience with them in terms of 
> conformance? What would be your best bet if you wanted to kill the EXI 
> effort in its tracks?
> 
> We will naturally accept any and all information that we get our hands 
> on, but in order to be able to make the best use of the information and 
> possibly to avoid being swamped, there are some aspects that we would 
> like to see alongside parser recommendations:
> 
>  * Some form of conformance statement. It needs to pass the XML Test 
> Suite (http://www.w3.org/XML/Test/).
>  * We need to be able to actually measure it. This entails that if the 
> code is not publicly available, we'll need a way to work out how we can 
> get a copy of the code to run it in our test system. This doesn't 
> necessarily mean making a copy of it available to all members of the WG, 
> but at least to the W3C staff so that they can run the tests.
>  * If you wish to be extra helpful, you can also include the small 
> amount of code and configuration required to run the parser within our 
> framework, which is built on top of Japex (https://japex.dev.java.net/). 
> If you're interested, we'd be delighted to help you get started.
> 
> Thank you very much in advance, we look forward to your input.
> 
> --Robin Berjon, on behalf of the EXI WG
>    Senior Research Scientist
>    Expway, http://expway.com/
> 
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
> 
>
Received on Sunday, 26 March 2006 06:38:04 UTC