- From: <noah_mendelsohn@us.ibm.com>
- Date: Thu, 7 Apr 2005 10:31:18 -0400
- To: "Bullard, Claude L (Len)" <len.bullard@intergraph.com>
- Cc: Andrew Layman <andrewl@microsoft.com>, "'Don Box'" <dbox@microsoft.com>, "Rice, Ed (HP.com)" <ed.rice@hp.com>, Paul Cotton <pcotton@microsoft.com>, www-tag@w3.org, klawrenc@us.ibm.com, haggar@us.ibm.com
Len Bullard writes: > HTTP needed no formal analysis nor test cases. > HTML needed no formal analysis nor test cases. > SOAP needed no formal analysis nor test cases. > The proof was the use and the rapid deployment > with the exception of the third item which is > so far, unproven but the market is patient. With respect, I don't think the measure of success for HTTP, HTML or SOAP was primarily performance. If it were, I would have thought the community would have wanted to get quite a bit of shared experience with benchmarks and performance models before agreeing to standardization. > The FastInfoset approach has been privately > benchmarked and proven to be workable in much the > same way as the cases given above. Since faster > performance is a customer requirement and not a > theoretical issue, customers can go to the > innovators who provide the necessary technology. > That would be, in this case, Sun. They are of > course, possibly willing to license that > technology to their partner in Redmond which has > slower and late to market technology to assist > them in coming to market. I am aware that Sun has done FastInfoset benchmarks. Having spent nearly 4 years leading teams doing high performance XML implementations, I can tell you that any benchmarks have to be run with great care. You need to do things like laying out your buffers in patterns that match your likely usage patterns, as it affects processor cache hit ratios. And yes, those can make a very noticeable difference. You also need to choose the appropriate text-based parsers against which to compare. For example, Xerces has many wonderful characteristics that make it the right choice for many purposes, but it is nowhere near the fastest parser you can write for many important high-performance applications. I'm not implying that Sun has or hasn't done a good job on these things, but as with many things, it's healthy to have publicly available tests that can be reproduced and studied. In the particular case of FastXML, my understanding is that there were two flavors. One was a schema-dependent implementation that relied on agreement between sender and receiver as to the format of the document. Tag information was sent only in cases like <choice> where sender and receiver could not presume what was to be inferred. That's an interesting design point, but it looses many of XMLs appealing characteristics of self-description. I suspect that it will prove more problematic as we start to do more work on versioning and extensibility, and as we see more applications exchanging information for which there is only partial agreement on the layout. I understand there was another embodiment of FastXML that sent a full infoset, though I'm still unclear on whether it depended on type information. Whether, for example, it could distinguish the following two instances: <e xsi:type="xsd:integer">123</e> <e xsi:type="xsd:integer">00123</e> To be a true Infoset implementation usable in SOAP, for example, you must be able to distinguish the above. Note that the usual digital signatures on these will be different. Are there published benchmarks of both of the above? Running in which sorts of applications? Throwing SAX events? Deserializaing to JAXRPC? All of these things make a difference. That's why we need public discussion and debate, based on benchmarks that not only yield good numbers, but that can be evaluated by the community to ensure that they accurately reflect what are likely to be realistic usage patterns. Are both of the FastXML approaches deemed to be of much higher performance than text, or only the schema-dependent one? Also, while I introduced the mention somewhat jokingly in my intro to Andrew's and Don's work, with enough expertise you can actually do some semi-formal performance models of these things. It depends on knowing a lot about how your systems and languages run, but in my experience people who build high performance implementations over a number of years develop fairly good intuitions about where time is going. For example, knowing the performance characteristics of your UTF-8 to UTF-16 conversion routines can be a really useful predictor of lower bounds on the performance of certain implemenations. It's usually quite easy to add up on a whiteboard how many such conversions, and of what length, will be done in various situations. Likewise for hashtable lookups, string pool accesses, etc. I'd feel better if I saw more such things discussed in the quantitatively in the community that's recommending a Binary XML standard. In summary, I think it is important to have a public debate about quantitative performance issues, preferably based on carefully run and reproduceable benchmarks. -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- "Bullard, Claude L (Len)" <len.bullard@intergraph.com> 04/07/2005 09:13 AM To: "'Don Box'" <dbox@microsoft.com>, "Rice, Ed (HP.com)" <ed.rice@hp.com>, noah_mendelsohn@us.ibm.com, www-tag@w3.org cc: Andrew Layman <andrewl@microsoft.com>, Paul Cotton <pcotton@microsoft.com> Subject: RE: Andrew Layman and Don Box Analysis of XML Optimization Techni ques HTTP needed no formal analysis nor test cases. HTML needed no formal analysis nor test cases. SOAP needed no formal analysis nor test cases. The proof was the use and the rapid deployment with the exception of the third item which is so far, unproven but the market is patient. The FastInfoset approach has been privately benchmarked and proven to be workable in much the same way as the cases given above. Since faster performance is a customer requirement and not a theoretical issue, customers can go to the innovators who provide the necessary technology. That would be, in this case, Sun. They are of course, possibly willing to license that technology to their partner in Redmond which has slower and late to market technology to assist them in coming to market. len
Received on Thursday, 7 April 2005 14:31:38 UTC