RE: [ANN] XSDBench XML Schema Benchmark 1.0.0 released from noah_mendelsohn@us.ibm.com on 2006-10-18 (xmlschema-dev@w3.org from October 2006)

From: <noah_mendelsohn@us.ibm.com>
Date: Wed, 18 Oct 2006 10:51:20 -0400
To: "'Boris Kolpackov'" <boris@codesynthesis.com>
Cc: "Michael Kay" <mike@saxonica.com>, xmlschema-dev@w3.org
Message-ID: <OFB370D39D.D2E70669-ON8525720B.004E84B0-8525720B.00519AED@lotus.com>
I'm pretty sure I announced this last year when we published, but those of 
you who are into the details of XML and Schema performance may be 
interested in some of the papers we published last year on our project 
called XML Screamer.  I'd point you in particular to our paper at WWW 2006 
[1], but also to related publications at XML 2005 [2] and in the IBM 
Systems Journal [3].  For the conference presentations, I'd encourage you 
to look at the full papers as well as the slides.  The main focus of our 
work was to evaluate strategies that integrate the parsing, 
schema-validation and deserialization of XML, but along the way we also 
documented some of our experiences relating to careful benchmarking, etc. 
A few highlights of our conclusions:

* APIs matter a lot, and so do input encodings.   For example, if you've 
got UTF-8 input and traditional SAX output, your performance is almost 
surely limited by the implied UTF-16 conversions for the strings.  Note 
that expat-based parsers tend to use their own APIs, which do not tend to 
require such conversions.  There's nothing wrong with that.  On the 
contrary, it's a good thing, but it can be a mistake to attribute the 
differences entirely to approaches to validation.   So, be very careful 
quoting comparisons between expat and other parsers.  You may just be 
measuring the performance of their APIs.  Indeed, there's no one correct 
API for all benchmarks.   You have to build benchmarks that model the 
environment you care about.  In particular, some applications need the 
full infoset, while many Web Services-style applications deserialize into 
business objects anyway. If you benchmark the latter using either SAX- or 
expat-style APIs you're wasting time compared to the ideal, which is to 
directly serialize into the structures the application wants.  If your 
application wants the full infoset, then something like expat may be 
dandy.

* Benchmarking details really matter.  That goes right down to, at least 
on some occasions, the cache architecture of the particular machine you've 
chosen.  We actually measured differences between two fairly similar (but 
not identical) Intel-based Thinkpads, and traced a 30% difference to the 
fact that some of our arrays just happened to have a stride that put them 
all in the same cache line -- but only on the one model Thinkpad.  So, 
it's a really good idea to run benchmarks on lots of different models of 
CPU (Pentium 4 vs. Centrino vs. Core, for example) and boxes (Thinkpad T21 
vs. T40 vs. Intel server) to see whether the performance ratios are 
consistent between runs.  Often they will be.  Occasionally something odd 
will completely mask what's going on at the XML level.  Similarly, if your 
real application is going to parse lots of documents, it can be 
appropriate to throw out the timings of your first few times through a 
test loop.  Doing that can better model the steady state performance, 
particularly if you've done a good job of making your parser small so it 
fits in the processor cache.  Caches are LOTS faster than RAM on modern 
processors.

* As Michael Kay has said, there is no one single benchmark that models 
what different applications will need.  Putting lots of different 
constructs into one test case will tend to give you some sort of weighted 
average, but may also eliminate optimizations that would otherwise have 
been possible.  It may make the whole test not fit in cache where a 
smaller one would have.  The only real answer is lots of tests at 
different sizes, with different mixes of markup vs. text, different 
schemas etc.  Then you can do a sensitivity analysis to see what a given 
parser does well, what it doesn't, and where the timings aren't stable.

* We found it useful to pick a processor family, in our case Pentiums, and 
to quote results in MBytes/sec/GHz.  In other words, to normalize to a 
1GHz processor.  We tested lots of Intel Pentiums, Xeons, etc.  (but not 
Core and Core Duo, which weren't out yet.)  We found results to be almost 
completely linear with processor speed, and even across CPU models, with 
one exception:  Centrinos were, for our purposes, somewhat faster per GHz 
than other Pentiums.  Obviously other architectures like SPARC or Power 
will give totally different throughput per Hz, but may be comparable 
within their own families (modulo cache architectures, etc.)

* For what it's worth, we measured expat to be on the order of 
12Mbytes/sec/GHz (I.e. 12MBytes/sec. on a 1GHz Pentium).  The xsdbench 
results at [4] seem to show 9 Mbytes/sec for expat on a 1GHz Pentium III, 
which is pretty close, and may also include overhead for validation.  So, 
that all looks nicely consistent.  For comparison, XML Screamer did 
parsing, XML Schema validation and deserialization into UTF-16 sax events 
at a median performance of 1.9x expat (which used its own UTF-8 API) 
averaged over a number of test cases.  I don't have the exact numbers 
handy (they're in the paper and I'm skimming our slides), but that should 
make XML Screamer about 22+ MBytes/sec/GHz doing SAX.  When we went 
directly to business objects, which is much lower overhead than SAX, our 
speed went up to be just under 3x expat, or something like 
35MBytes/sec/GHz.

The papers explain many of the techniques we used.  I should emphasize 
that XML Screamer was a prototype.  We did the work starting in 2001, and 
the code has been untouched for several years.  It would not be 
particularly convenient to resurrect it to run new tests at this point, or 
to get the clearances or to do the packaging we would have to do to make 
the code available for public use.  It was a research project.  I do hope 
the information in the papers is useful, both in terms of exploring some 
issues relating to benchmarking, and in terms of explaining the techniques 
we used to achieve high performance.

Noah

[1] http://www2006.org/programme/item.php?id=5011
[2] http://www.idealliance.org/proceedings/xml05/abstracts/paper246.HTML
[3] http://www.research.ibm.com/journal/sj/452/perkins.html
[4] http://www.codesynthesis.com/projects/xsdbench/results/2006-10-16-02/

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
Received on Wednesday, 18 October 2006 14:51:44 UTC