- From: <noah_mendelsohn@us.ibm.com>
- Date: Wed, 18 Oct 2006 10:51:20 -0400
- To: "'Boris Kolpackov'" <boris@codesynthesis.com>
- Cc: "Michael Kay" <mike@saxonica.com>, xmlschema-dev@w3.org
I'm pretty sure I announced this last year when we published, but those of you who are into the details of XML and Schema performance may be interested in some of the papers we published last year on our project called XML Screamer. I'd point you in particular to our paper at WWW 2006 [1], but also to related publications at XML 2005 [2] and in the IBM Systems Journal [3]. For the conference presentations, I'd encourage you to look at the full papers as well as the slides. The main focus of our work was to evaluate strategies that integrate the parsing, schema-validation and deserialization of XML, but along the way we also documented some of our experiences relating to careful benchmarking, etc. A few highlights of our conclusions: * APIs matter a lot, and so do input encodings. For example, if you've got UTF-8 input and traditional SAX output, your performance is almost surely limited by the implied UTF-16 conversions for the strings. Note that expat-based parsers tend to use their own APIs, which do not tend to require such conversions. There's nothing wrong with that. On the contrary, it's a good thing, but it can be a mistake to attribute the differences entirely to approaches to validation. So, be very careful quoting comparisons between expat and other parsers. You may just be measuring the performance of their APIs. Indeed, there's no one correct API for all benchmarks. You have to build benchmarks that model the environment you care about. In particular, some applications need the full infoset, while many Web Services-style applications deserialize into business objects anyway. If you benchmark the latter using either SAX- or expat-style APIs you're wasting time compared to the ideal, which is to directly serialize into the structures the application wants. If your application wants the full infoset, then something like expat may be dandy. * Benchmarking details really matter. That goes right down to, at least on some occasions, the cache architecture of the particular machine you've chosen. We actually measured differences between two fairly similar (but not identical) Intel-based Thinkpads, and traced a 30% difference to the fact that some of our arrays just happened to have a stride that put them all in the same cache line -- but only on the one model Thinkpad. So, it's a really good idea to run benchmarks on lots of different models of CPU (Pentium 4 vs. Centrino vs. Core, for example) and boxes (Thinkpad T21 vs. T40 vs. Intel server) to see whether the performance ratios are consistent between runs. Often they will be. Occasionally something odd will completely mask what's going on at the XML level. Similarly, if your real application is going to parse lots of documents, it can be appropriate to throw out the timings of your first few times through a test loop. Doing that can better model the steady state performance, particularly if you've done a good job of making your parser small so it fits in the processor cache. Caches are LOTS faster than RAM on modern processors. * As Michael Kay has said, there is no one single benchmark that models what different applications will need. Putting lots of different constructs into one test case will tend to give you some sort of weighted average, but may also eliminate optimizations that would otherwise have been possible. It may make the whole test not fit in cache where a smaller one would have. The only real answer is lots of tests at different sizes, with different mixes of markup vs. text, different schemas etc. Then you can do a sensitivity analysis to see what a given parser does well, what it doesn't, and where the timings aren't stable. * We found it useful to pick a processor family, in our case Pentiums, and to quote results in MBytes/sec/GHz. In other words, to normalize to a 1GHz processor. We tested lots of Intel Pentiums, Xeons, etc. (but not Core and Core Duo, which weren't out yet.) We found results to be almost completely linear with processor speed, and even across CPU models, with one exception: Centrinos were, for our purposes, somewhat faster per GHz than other Pentiums. Obviously other architectures like SPARC or Power will give totally different throughput per Hz, but may be comparable within their own families (modulo cache architectures, etc.) * For what it's worth, we measured expat to be on the order of 12Mbytes/sec/GHz (I.e. 12MBytes/sec. on a 1GHz Pentium). The xsdbench results at [4] seem to show 9 Mbytes/sec for expat on a 1GHz Pentium III, which is pretty close, and may also include overhead for validation. So, that all looks nicely consistent. For comparison, XML Screamer did parsing, XML Schema validation and deserialization into UTF-16 sax events at a median performance of 1.9x expat (which used its own UTF-8 API) averaged over a number of test cases. I don't have the exact numbers handy (they're in the paper and I'm skimming our slides), but that should make XML Screamer about 22+ MBytes/sec/GHz doing SAX. When we went directly to business objects, which is much lower overhead than SAX, our speed went up to be just under 3x expat, or something like 35MBytes/sec/GHz. The papers explain many of the techniques we used. I should emphasize that XML Screamer was a prototype. We did the work starting in 2001, and the code has been untouched for several years. It would not be particularly convenient to resurrect it to run new tests at this point, or to get the clearances or to do the packaging we would have to do to make the code available for public use. It was a research project. I do hope the information in the papers is useful, both in terms of exploring some issues relating to benchmarking, and in terms of explaining the techniques we used to achieve high performance. Noah [1] http://www2006.org/programme/item.php?id=5011 [2] http://www.idealliance.org/proceedings/xml05/abstracts/paper246.HTML [3] http://www.research.ibm.com/journal/sj/452/perkins.html [4] http://www.codesynthesis.com/projects/xsdbench/results/2006-10-16-02/ -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 --------------------------------------
Received on Wednesday, 18 October 2006 14:51:44 UTC