- From: <noah_mendelsohn@us.ibm.com>
- Date: Mon, 25 Feb 2008 20:43:55 -0500
- To: www-tag@w3.org
- Cc: Rob Cameron <cameron@cs.sfu.ca>, peter_haggar@lotus.com
One aspect of the ongoing debate about the pros and cons of standardizing "Efficient XML Interchange" (EXI) involves the question of: if XML isn't running fast enough for some purpose now, how much of that slowness is inherent in XML itself, and how much is attributable to lack of optimization of the processing software. Our XML Screamer work has often been cited as one point of reference as to how fast XML 1.0 can be parsed, validated, and processed. Attached is a note that Rob Cameron of Simon Fraser University just sent to xml-dev. He reports on an open-source XML implementation that uses the SIMD (parallel) features of modern processor chips to run several times faster than XML Screamer. So, in addition to being a really nice piece of work, this seems to confirm that the performance we reported for XML Screamer is probably conservative. Also, Rob has done his work in an open source implementation that should make it easier for others to check and reproduce his results. For what it's worth, my personal conclusion has been and is that speed alone is in most cases not enought to justify standardizing something like EXI: if there is enough demand for a system that combines compression with speed, then there may indeed be justification for standardizing EXI, but my intuition is that we should look primarily to use cases where size as well as raw speed is important. -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- ----- Forwarded by Noah Mendelsohn/Cambridge/IBM on 02/25/2008 04:14 PM ----- Rob Cameron <cameron@cs.sfu.ca> 02/25/2008 08:13 AM To: xml-dev@lists.xml.org cc: (bcc: Noah Mendelsohn/Cambridge/IBM) Subject: [xml-dev] XML parsing @ 100MB-1000MB/sec/GHz with Parallel Bit Streams I am pleased to announce the availability of parabix-0.40, a high-performance XML parsing engine prototype that can parse text-oriented XML document on commodity processors at over 200MB/sec per processor GHz and data-oriented XML documents at speeds approaching that. At this point, this includes correct parsing of correct documents and dispatch to markup action routines using an in-line API for XML (ilax). As the parabix stack is built out to incorporate validation and object creation, I am expecting overall performance above 100MB/sec/GHz. With linear speed-up on multicore processors and other improvements, 1000MB/sec/GHz is forseeable. By way of comparison, XML Screamer (Koustalas et al, WWW 2006) performs parsing, validation and business object creation on commodity processors at the rate of 23-46 MB/sec per processor GHz (MB/sec/GHz), a substantial increase over the cited rate of 2.5-6 MB/sec/GHz for traditional validating parsers. This is very good performance for traditional character-at-a-time parsing, taking advantage of a collection of techniques such as optimization across layers and schema-based customization. As a benchmark, 100 MB/sec/GHz is cited as the limit on throughput achievable for a simple character-at-a-time scanning loop. My research is investigating the development of very high-speed text processing based on a fundamentally new approach: using parallel bit streams to represent character data and the SIMD processor capabilities of commodity CPUs to process these bit streams. I have first applied these techniques to the problem of UTF-8 to UTF-16 transcoding, to achieve end-to-end speed-up of 3X to 25X compared with standard iconv and similar implementations. The open source implementation of u8u16 is available at http://u8u16.costar.sfu.ca/ and the results have just been presented to ACM PPoPP 2008 in Salt Lake City. Parabix (parallel bit streams for XML) is a research prototype that is nevertheless being designed to become the basis for a full XML processing stack. The working code repository is now available as an open source code base under OSL 3.0. http://parabix.costar.sfu.ca/ I am hoping to accelerate development of parabix technology through the open source model as well as continuing the academic research project with a team of graduate students who are coming up to speed. I have also created a spin-off company to oversee commercial development of the technology. However, in the context of discussion of XML performance issues and the next ten years of development of XML technology, I think that the work is sufficiently well advanced to support the following advice: Do not assume that XML processing performance is inherently limited by the nature of present-day character-at-a-time parsing technology. Intraregister and intrachip parallelism hold out a realistic promise of dramatic performance improvement on commodity processors. -- Robert D. Cameron, Ph.D. Professor of Computing Science, Simon Fraser University President and CTO, International Characters, Inc. _______________________________________________________________________ XML-DEV is a publicly archived, unmoderated list hosted by OASIS to support XML implementation and development. To minimize spam in the archives, you must subscribe before posting. [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ Or unsubscribe: xml-dev-unsubscribe@lists.xml.org subscribe: xml-dev-subscribe@lists.xml.org List archive: http://lists.xml.org/archives/xml-dev/ List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
Received on Tuesday, 26 February 2008 01:43:23 UTC