- From: <noah_mendelsohn@us.ibm.com>
- Date: Thu, 7 Apr 2005 10:31:18 -0400
- To: "Bullard, Claude L (Len)" <len.bullard@intergraph.com>
- Cc: Andrew Layman <andrewl@microsoft.com>, "'Don Box'" <dbox@microsoft.com>, "Rice, Ed (HP.com)" <ed.rice@hp.com>, Paul Cotton <pcotton@microsoft.com>, www-tag@w3.org, klawrenc@us.ibm.com, haggar@us.ibm.com
Len Bullard writes:
> HTTP needed no formal analysis nor test cases.
> HTML needed no formal analysis nor test cases.
> SOAP needed no formal analysis nor test cases.
> The proof was the use and the rapid deployment
> with the exception of the third item which is
> so far, unproven but the market is patient.
With respect, I don't think the measure of success for HTTP, HTML or SOAP
was primarily performance. If it were, I would have thought the
community would have wanted to get quite a bit of shared experience with
benchmarks and performance models before agreeing to standardization.
> The FastInfoset approach has been privately
> benchmarked and proven to be workable in much the
> same way as the cases given above. Since faster
> performance is a customer requirement and not a
> theoretical issue, customers can go to the
> innovators who provide the necessary technology.
> That would be, in this case, Sun. They are of
> course, possibly willing to license that
> technology to their partner in Redmond which has
> slower and late to market technology to assist
> them in coming to market.
I am aware that Sun has done FastInfoset benchmarks. Having spent nearly
4 years leading teams doing high performance XML implementations, I can
tell you that any benchmarks have to be run with great care. You need to
do things like laying out your buffers in patterns that match your likely
usage patterns, as it affects processor cache hit ratios. And yes, those
can make a very noticeable difference. You also need to choose the
appropriate text-based parsers against which to compare. For example,
Xerces has many wonderful characteristics that make it the right choice
for many purposes, but it is nowhere near the fastest parser you can write
for many important high-performance applications. I'm not implying that
Sun has or hasn't done a good job on these things, but as with many
things, it's healthy to have publicly available tests that can be
reproduced and studied.
In the particular case of FastXML, my understanding is that there were two
flavors. One was a schema-dependent implementation that relied on
agreement between sender and receiver as to the format of the document.
Tag information was sent only in cases like <choice> where sender and
receiver could not presume what was to be inferred. That's an interesting
design point, but it looses many of XMLs appealing characteristics of
self-description. I suspect that it will prove more problematic as we
start to do more work on versioning and extensibility, and as we see more
applications exchanging information for which there is only partial
agreement on the layout. I understand there was another embodiment of
FastXML that sent a full infoset, though I'm still unclear on whether it
depended on type information. Whether, for example, it could distinguish
the following two instances:
<e xsi:type="xsd:integer">123</e>
<e xsi:type="xsd:integer">00123</e>
To be a true Infoset implementation usable in SOAP, for example, you must
be able to distinguish the above. Note that the usual digital signatures
on these will be different.
Are there published benchmarks of both of the above? Running in which
sorts of applications? Throwing SAX events? Deserializaing to JAXRPC?
All of these things make a difference. That's why we need public
discussion and debate, based on benchmarks that not only yield good
numbers, but that can be evaluated by the community to ensure that they
accurately reflect what are likely to be realistic usage patterns. Are
both of the FastXML approaches deemed to be of much higher performance
than text, or only the schema-dependent one?
Also, while I introduced the mention somewhat jokingly in my intro to
Andrew's and Don's work, with enough expertise you can actually do some
semi-formal performance models of these things. It depends on knowing a
lot about how your systems and languages run, but in my experience people
who build high performance implementations over a number of years develop
fairly good intuitions about where time is going. For example, knowing
the performance characteristics of your UTF-8 to UTF-16 conversion
routines can be a really useful predictor of lower bounds on the
performance of certain implemenations. It's usually quite easy to add up
on a whiteboard how many such conversions, and of what length, will be
done in various situations. Likewise for hashtable lookups, string pool
accesses, etc. I'd feel better if I saw more such things discussed in
the quantitatively in the community that's recommending a Binary XML
standard.
In summary, I think it is important to have a public debate about
quantitative performance issues, preferably based on carefully run and
reproduceable benchmarks.
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
"Bullard, Claude L (Len)" <len.bullard@intergraph.com>
04/07/2005 09:13 AM
To: "'Don Box'" <dbox@microsoft.com>, "Rice, Ed (HP.com)" <ed.rice@hp.com>,
noah_mendelsohn@us.ibm.com, www-tag@w3.org
cc: Andrew Layman <andrewl@microsoft.com>, Paul Cotton <pcotton@microsoft.com>
Subject: RE: Andrew Layman and Don Box Analysis of XML Optimization Techni ques
HTTP needed no formal analysis nor test cases.
HTML needed no formal analysis nor test cases.
SOAP needed no formal analysis nor test cases.
The proof was the use and the rapid deployment
with the exception of the third item which is
so far, unproven but the market is patient.
The FastInfoset approach has been privately benchmarked and proven
to be workable in much the same way as the cases given above. Since
faster performance is a customer requirement and not a theoretical
issue, customers can go to the innovators who provide the necessary
technology.
That would be, in this case, Sun. They are of course, possibly willing
to license that technology to their partner in Redmond which has
slower and late to market technology to assist them in coming to market.
len
Received on Thursday, 7 April 2005 14:31:38 UTC