- From: Vogelheim, Daniel <daniel.vogelheim@siemens.com>
- Date: Fri, 13 Oct 2006 19:39:38 +0200
- To: <public-exi@w3.org>
- Cc: "Tatu Saloranta" <tsaloranta@gmail.com>
Hello Tatu, Many thanks for taking an interest in the work of the EXI working group. You wrote: > Apologies if this has been asked earlier, but after reading the > published draft, I noticed that comparisons seemed to only include > parsers expected to be faster than the commonly used one. I can > understand the desire to keep number of implementations measure > limited, but I was hoping that in addition to "best of the best", > couple of most commonly used parsers (like, Xerces-J) could also be > included. Our measurements include both the JAXP parser (i.e. the standard JDK parser, whatever that happens to be) and an optimized parser. The rationale for using the optimized parser as reference in the measurements note is that, as EXI will introduce a new and optimized XML serialization format, we need to proove that the performance improvement derives actually from the new format as opposed to from merely improved implementation techniques. One way to achieve this is to compare the respective best of breed. A slightly different way of putting it is: A performance oriented developer would almost certainly use an optimized implementation before considering a change in the underlying format. Our paper answers the question of how much more such a developer could expect by changing to the EXI format instead. > My own selfish motivation is that this would also allow me to compare > relative performance of the java xml parser I am mostly working on > (Woodstox), even if I couldn't get access to (or have time to get > ones written in other languages) the fastest ones included in exi > experiments. For example, observing that the performance difference > between Xerces and Woodstox appears to be somewhere between 20 - 40% > would allow me to infer approximate ratios to faster parsers. The actual performance differential varies greatly between test documents and use cases. >From a slightly older test run: The (harmonic) mean over a large range of documents for JAXP and XALS lists 8.6 tps and 10.2 tps, respectively. (tps: transactions per second. The test suite measures and reports results as the number document parses over a given time.) However, those results are skewed by a number of pathological test cases which consist almost entirely of character data with very little actual XML in them. When browsing over the individual test cases I regularly see a factor of 1.5x, and there are a number of test cases where we see a factor of 2x. (Over real world test data, representing various use cases.) > So, is there a chance that one or two of the most commonly (if not > fastest) used compliant xml parsers could also be included, for > baselining purposes? I suspect we won't be incorporating additional parsers into the test suite at this stage, but I would hope that by including JAXP parsing we already meet your requirement. I will leave it to the paper's editors to decide how exactly the data will be presented. I assume that we will publish more detailed test data when we have more complete and stable measurement results, so that you could do your own analysis. Unfortunately I do not yet know when, how much, and in which form. Tatu, I hope this helps. Please let me (and the list) know if you have additional questions. Sincerely, Daniel Vogelheim
Received on Monday, 16 October 2006 01:20:05 UTC