- From: David Orchard <dorchard@bea.com>
- Date: Thu, 7 Apr 2005 08:00:39 -0700
- To: <noah_mendelsohn@us.ibm.com>, "Bullard, Claude L \(Len\)" <len.bullard@intergraph.com>
- Cc: "Andrew Layman" <andrewl@microsoft.com>, "Don Box" <dbox@microsoft.com>, "Rice, Ed \(HP.com\)" <ed.rice@hp.com>, "Paul Cotton" <pcotton@microsoft.com>, <www-tag@w3.org>, <klawrenc@us.ibm.com>, <haggar@us.ibm.com>
My position remains the same as articulated in BEA's position paper [1] for the binary interchange workshop, particularly in the "How To Measure Candidate Solutions" section, bullet 4.c ("Measurable benefit(properties) from benchmarks") in the "Recommendations" section, and expressed further in the workshop. My position also remains the same on the importance architectural properties of self-description and extensibility, also articulated in our paper. I find it disappointing that 1 1/2 years after we made recommendations that serious and normalized benchmarks be done to provide data for a rigorous comparison of architectural properties of various solutions, I'm back to making the same recommendations. Cheers, Dave [1] http://www.w3.org/2003/08/binary-interchange-workshop/26-bea-BinaryXMLWS .pdf > -----Original Message----- > From: www-tag-request@w3.org [mailto:www-tag-request@w3.org] On Behalf Of > noah_mendelsohn@us.ibm.com > Sent: Thursday, April 07, 2005 7:31 AM > To: Bullard, Claude L (Len) > Cc: Andrew Layman; 'Don Box'; Rice, Ed (HP.com); Paul Cotton; www- > tag@w3.org; klawrenc@us.ibm.com; haggar@us.ibm.com > Subject: RE: Andrew Layman and Don Box Analysis of XML Optimization Techni > ques > > > Len Bullard writes: > > > HTTP needed no formal analysis nor test cases. > > HTML needed no formal analysis nor test cases. > > SOAP needed no formal analysis nor test cases. > > The proof was the use and the rapid deployment > > with the exception of the third item which is > > so far, unproven but the market is patient. > > With respect, I don't think the measure of success for HTTP, HTML or SOAP > was primarily performance. If it were, I would have thought the > community would have wanted to get quite a bit of shared experience with > benchmarks and performance models before agreeing to standardization. > > > The FastInfoset approach has been privately > > benchmarked and proven to be workable in much the > > same way as the cases given above. Since faster > > performance is a customer requirement and not a > > theoretical issue, customers can go to the > > innovators who provide the necessary technology. > > > That would be, in this case, Sun. They are of > > course, possibly willing to license that > > technology to their partner in Redmond which has > > slower and late to market technology to assist > > them in coming to market. > > I am aware that Sun has done FastInfoset benchmarks. Having spent nearly > 4 years leading teams doing high performance XML implementations, I can > tell you that any benchmarks have to be run with great care. You need to > do things like laying out your buffers in patterns that match your likely > usage patterns, as it affects processor cache hit ratios. And yes, those > can make a very noticeable difference. You also need to choose the > appropriate text-based parsers against which to compare. For example, > Xerces has many wonderful characteristics that make it the right choice > for many purposes, but it is nowhere near the fastest parser you can write > for many important high-performance applications. I'm not implying that > Sun has or hasn't done a good job on these things, but as with many > things, it's healthy to have publicly available tests that can be > reproduced and studied. > > In the particular case of FastXML, my understanding is that there were two > flavors. One was a schema-dependent implementation that relied on > agreement between sender and receiver as to the format of the document. > Tag information was sent only in cases like <choice> where sender and > receiver could not presume what was to be inferred. That's an interesting > design point, but it looses many of XMLs appealing characteristics of > self-description. I suspect that it will prove more problematic as we > start to do more work on versioning and extensibility, and as we see more > applications exchanging information for which there is only partial > agreement on the layout. I understand there was another embodiment of > FastXML that sent a full infoset, though I'm still unclear on whether it > depended on type information. Whether, for example, it could distinguish > the following two instances: > > <e xsi:type="xsd:integer">123</e> > <e xsi:type="xsd:integer">00123</e> > > To be a true Infoset implementation usable in SOAP, for example, you must > be able to distinguish the above. Note that the usual digital signatures > on these will be different. > > Are there published benchmarks of both of the above? Running in which > sorts of applications? Throwing SAX events? Deserializaing to JAXRPC? > All of these things make a difference. That's why we need public > discussion and debate, based on benchmarks that not only yield good > numbers, but that can be evaluated by the community to ensure that they > accurately reflect what are likely to be realistic usage patterns. Are > both of the FastXML approaches deemed to be of much higher performance > than text, or only the schema-dependent one? > > Also, while I introduced the mention somewhat jokingly in my intro to > Andrew's and Don's work, with enough expertise you can actually do some > semi-formal performance models of these things. It depends on knowing a > lot about how your systems and languages run, but in my experience people > who build high performance implementations over a number of years develop > fairly good intuitions about where time is going. For example, knowing > the performance characteristics of your UTF-8 to UTF-16 conversion > routines can be a really useful predictor of lower bounds on the > performance of certain implemenations. It's usually quite easy to add up > on a whiteboard how many such conversions, and of what length, will be > done in various situations. Likewise for hashtable lookups, string pool > accesses, etc. I'd feel better if I saw more such things discussed in > the quantitatively in the community that's recommending a Binary XML > standard. > > In summary, I think it is important to have a public debate about > quantitative performance issues, preferably based on carefully run and > reproduceable benchmarks. > > -------------------------------------- > Noah Mendelsohn > IBM Corporation > One Rogers Street > Cambridge, MA 02142 > 1-617-693-4036 > -------------------------------------- > > > > > > > > > "Bullard, Claude L (Len)" <len.bullard@intergraph.com> > 04/07/2005 09:13 AM > > > To: "'Don Box'" <dbox@microsoft.com>, "Rice, Ed (HP.com)" > <ed.rice@hp.com>, > noah_mendelsohn@us.ibm.com, www-tag@w3.org > cc: Andrew Layman <andrewl@microsoft.com>, Paul Cotton > <pcotton@microsoft.com> > Subject: RE: Andrew Layman and Don Box Analysis of XML > Optimization Techni ques > > > HTTP needed no formal analysis nor test cases. > HTML needed no formal analysis nor test cases. > SOAP needed no formal analysis nor test cases. > The proof was the use and the rapid deployment > with the exception of the third item which is > so far, unproven but the market is patient. > > The FastInfoset approach has been privately benchmarked and proven > to be workable in much the same way as the cases given above. Since > faster performance is a customer requirement and not a theoretical > issue, customers can go to the innovators who provide the necessary > technology. > > That would be, in this case, Sun. They are of course, possibly willing > to license that technology to their partner in Redmond which has > slower and late to market technology to assist them in coming to market. > > len > > >
Received on Thursday, 7 April 2005 15:01:26 UTC