- From: <noah_mendelsohn@us.ibm.com>
- Date: Thu, 19 Oct 2006 18:15:59 -0400
- To: "Michael Kay" <mike@saxonica.com>
- Cc: "'Boris Kolpackov'" <boris@codesynthesis.com>, xmlschema-dev@w3.org
Michael Kay writes: > I would also question whether the performance benefits are > worth the loss of > an architectural boundary between components (the parser and > the validator) > that ought to be kept separate from a software engineering > perspective. In > general, my experience of systems development says that you > tend to regret > doing such things within a couple of years. One of the really interesting aspects of working on our project has been to understand how different the mindsets are in different communities. We viewed our project as an experiment in compilation. I think it's fair to say that the level-breaking approach we took is not only common in the compiler community, it's fundamental to the performance that we all today expect out of compiled languages. While the design of the languages may be very carefully layered, when you get to the low level optimizations most compilers do, they indeed cross just the sorts of boundaries that are analagous to the ones we break. So, for example, the compiler will notice that the same intermediate value that's used in some assignment expression is useful in the computation of an array index or loop variable. Indeed, the compilers will integrate such optimizations across variables or expressions that are explicit in the source, as well as temporaries that don't exist as abstractions at all outside the compiler. They will take a statement that's inside a loop and move just some of its logic outside. If you look at the actual generated assembler code from a modern compiler and compare it to the source, it's amazing how much integration there's been across what you'd consider separate layers. Similarly, if you look at really high performance networking systems, there is a degree of layering in the code, but the cooperation across them is often rather subtle. A real example from about 20 years ago: when you are implementing something like NFS (network filesystem), you wind up with packets that contain various levels of internet headers (IP, whatever) and also commonly the image of a disk block. The impractically slow implementations of such systems are carefully layered, with the network stuff separate from the filesystem. The result is almost invariably a memory-to-memory copy of the rather large disk blocks across the layers. The implementations that work efficiently very carefully integrate the management of filesystem buffers with the management of network buffers, even though the two are otherwise at completely separate layers of abstraction. Typically, the network system can grab a filesystem buffer, and hand a pointer to it directly to the network hardware (yet another layer), so that the disk block can be read or written directly off the wire. The ratio between 4K bytes copied and 0 bytes copied can be quite compelling when there are lots of packets. As with most compilers, all the tricks we play in our XML Screamer project are hidden in our code generators and runtime libraries. As with most aggressive optimizing compilers, this stuff is not for the faint of heart, and it's not worth doing if you value clean architecture above performance. There is a lot of work that suggests that to get the highest performance, whether in network systems or in compilers, you need to integrate across layers. Indeed, in our paper, we quote from one of my favorite works on the subject, a 20 year old paper by Richard Watson [1] in which he writes: "a common mistake is to take a layered design as a requirement for a correspondingly layered implementation." That paper was about the optimization of network protocols, but I think the advice applies here as well. It was big news in the networking community then. It's interesting, but not surprising, that the XML community still comes from a different mindset. The result is that only occasionally does off the shelf XML software deliver the performance that modern hardware architectures are capable of. Many optimizing compilers come very close to the ideal in the code they generate, but what they go through to do it is quite complex and messy. One of the purposes of our project was to see what would happen if we brough a compiler-writer's mindset to the implementation of XML software. None of the above is saying that one should in general do the integration we did. Our project shows that you have a choice: if you're willing to do the sorts of optimizations that compilers do, you can run faster. They are indeed tricky at times, and if you don't need the speed, you can avoid them. On the other hand, as with most optimizing compilers, once someone with an urgent need pays to get them debugged, everyone benefits. Using a compiler is easy; it's writing and debugging it that's hard! Noah [1] Watson, R.W., and Mamrak, S.A. Gaining efficiency in transport services by appropriate design and implementation choices. ACM Transactions on Computer Systems (TOCS), v.5 n.2, p.97-120, May 1987. -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- "Michael Kay" <mike@saxonica.com> Sent by: xmlschema-dev-request@w3.org 10/19/2006 05:33 PM To: "'Boris Kolpackov'" <boris@codesynthesis.com>, <noah_mendelsohn@us.ibm.com> cc: <xmlschema-dev@w3.org> Subject: RE: [ANN] XSDBench XML Schema Benchmark 1.0.0 released > That must have been some pretty tight integration of XML > parsing and schema-based validation. For example when you > validate, say a float, as an element value then you have to > look for both legal float characters as well as '<'. ... > Also I tend to believe that most existing parsers don't have > this architecture. I would also question whether the performance benefits are worth the loss of an architectural boundary between components (the parser and the validator) that ought to be kept separate from a software engineering perspective. In general, my experience of systems development says that you tend to regret doing such things within a couple of years. Michael Kay http://www.saxonica.com/
Received on Thursday, 19 October 2006 22:16:14 UTC