RE: Andrew Layman and Don Box Analysis of XML Optimization Techni ques from David Orchard on 2005-04-07 (www-tag@w3.org from April 2005)

From: David Orchard <dorchard@bea.com>
Date: Thu, 7 Apr 2005 08:00:39 -0700
To: <noah_mendelsohn@us.ibm.com>, "Bullard, Claude L \(Len\)" <len.bullard@intergraph.com>
Cc: "Andrew Layman" <andrewl@microsoft.com>, "Don Box" <dbox@microsoft.com>, "Rice, Ed \(HP.com\)" <ed.rice@hp.com>, "Paul Cotton" <pcotton@microsoft.com>, <www-tag@w3.org>, <klawrenc@us.ibm.com>, <haggar@us.ibm.com>
Message-ID: <32D5845A745BFB429CBDBADA57CD41AF0EC561C3@ussjex01.amer.bea.com>
My position remains the same as articulated in BEA's position paper [1]
for the binary interchange workshop, particularly in the "How To Measure
Candidate Solutions" section, bullet 4.c ("Measurable
benefit(properties) from benchmarks") in the "Recommendations" section,
and expressed further in the workshop.

My position also remains the same on the importance architectural
properties of self-description and extensibility, also articulated in
our paper.  

I find it disappointing that 1 1/2 years after we made recommendations
that serious and normalized benchmarks be done to provide data for a
rigorous comparison of architectural properties of various solutions,
I'm back to making the same recommendations.

Cheers,
Dave

[1]
http://www.w3.org/2003/08/binary-interchange-workshop/26-bea-BinaryXMLWS
.pdf




> -----Original Message-----
> From: www-tag-request@w3.org [mailto:www-tag-request@w3.org] On Behalf
Of
> noah_mendelsohn@us.ibm.com
> Sent: Thursday, April 07, 2005 7:31 AM
> To: Bullard, Claude L (Len)
> Cc: Andrew Layman; 'Don Box'; Rice, Ed (HP.com); Paul Cotton; www-
> tag@w3.org; klawrenc@us.ibm.com; haggar@us.ibm.com
> Subject: RE: Andrew Layman and Don Box Analysis of XML Optimization
Techni
> ques
> 
> 
> Len Bullard writes:
> 
> > HTTP needed no formal analysis nor test cases.
> > HTML needed no formal analysis nor test cases.
> > SOAP needed no formal analysis nor test cases.
> > The proof was the use and the rapid deployment
> > with the exception of the third item which is
> > so far, unproven but the market is patient.
> 
> With respect, I don't think the measure of success for HTTP, HTML or
SOAP
> was primarily performance.   If it were, I would have thought the
> community would have wanted to get quite a bit of shared experience
with
> benchmarks and performance models before agreeing to standardization.
> 
> > The FastInfoset approach has been privately
> > benchmarked and proven to be workable in much the
> > same way as the cases given above.  Since faster
> > performance is a customer requirement and not a
> > theoretical issue, customers can go to the
> > innovators who provide the necessary technology.
> 
> > That would be, in this case, Sun.  They are of
> > course, possibly willing to license that
> > technology to their partner in Redmond which has
> > slower and late to market technology to assist
> > them in coming to market.
> 
>  I am aware that Sun has done FastInfoset benchmarks.  Having spent
nearly
> 4 years leading teams doing high performance XML implementations, I
can
> tell you that any benchmarks have to be run with great care.  You need
to
> do things like laying out your buffers in patterns that match your
likely
> usage patterns, as it affects processor cache hit ratios.  And yes,
those
> can make a very noticeable difference.   You also need to choose the
> appropriate text-based parsers against which to compare.   For
example,
> Xerces has many wonderful characteristics that make it the right
choice
> for many purposes, but it is nowhere near the fastest parser you can
write
> for many important high-performance applications.   I'm not implying
that
> Sun has or hasn't done a good job on these things, but as with many
> things, it's healthy to have publicly available tests that can be
> reproduced and studied.
> 
> In the particular case of FastXML, my understanding is that there were
two
> flavors.  One was a schema-dependent implementation that relied on
> agreement between sender and receiver as to the format of the
document.
> Tag information was sent only in cases like <choice> where sender and
> receiver could not presume what was to be inferred.  That's an
interesting
> design point, but it looses many of XMLs appealing characteristics of
> self-description.  I suspect that it will prove more problematic as we
> start to do more work on versioning and extensibility, and as we see
more
> applications exchanging information for which there is only partial
> agreement on the layout.  I understand there was another embodiment of
> FastXML that sent a full infoset, though I'm still unclear on whether
it
> depended on type information.  Whether, for example, it could
distinguish
> the following two instances:
> 
>         <e xsi:type="xsd:integer">123</e>
>         <e xsi:type="xsd:integer">00123</e>
> 
> To be a true Infoset implementation usable in SOAP, for example, you
must
> be able to distinguish the above.  Note that the usual digital
signatures
> on these will be different.
> 
> Are there published benchmarks of both of the above?  Running in which
> sorts of applications?  Throwing SAX events?  Deserializaing to
JAXRPC?
> All of these things make a difference.  That's why we need public
> discussion and debate, based on benchmarks that not only yield good
> numbers, but that can be evaluated by the community to ensure that
they
> accurately reflect what are likely to be realistic usage patterns.
Are
> both of the FastXML approaches deemed to be of much higher performance
> than text, or only the schema-dependent one?
> 
> Also, while I introduced the mention somewhat jokingly in my intro to
> Andrew's and Don's work, with enough expertise you can actually do
some
> semi-formal performance models of these things.   It depends on
knowing a
> lot about how your systems and languages run, but in my experience
people
> who build high performance implementations over a number of years
develop
> fairly good intuitions about where time is going.  For example,
knowing
> the performance characteristics of your UTF-8 to UTF-16 conversion
> routines can be a really useful predictor of lower bounds on the
> performance of certain implemenations.  It's usually quite easy to add
up
> on a whiteboard how many such conversions, and of what length, will be
> done in various situations.  Likewise for hashtable lookups, string
pool
> accesses, etc.   I'd feel better if I saw more such things discussed
in
> the quantitatively in the community that's recommending a Binary XML
> standard.
> 
> In summary, I think it is important to have a public debate about
> quantitative performance issues, preferably based on carefully run and
> reproduceable benchmarks.
> 
> --------------------------------------
> Noah Mendelsohn
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------
> 
> 
> 
> 
> 
> 
> 
> 
> "Bullard, Claude L (Len)" <len.bullard@intergraph.com>
> 04/07/2005 09:13 AM
> 
> 
>         To:     "'Don Box'" <dbox@microsoft.com>, "Rice, Ed (HP.com)"
> <ed.rice@hp.com>,
> noah_mendelsohn@us.ibm.com, www-tag@w3.org
>         cc:     Andrew Layman <andrewl@microsoft.com>, Paul Cotton
> <pcotton@microsoft.com>
>         Subject:        RE: Andrew Layman and Don Box Analysis of XML
> Optimization Techni ques
> 
> 
> HTTP needed no formal analysis nor test cases.
> HTML needed no formal analysis nor test cases.
> SOAP needed no formal analysis nor test cases.
> The proof was the use and the rapid deployment
> with the exception of the third item which is
> so far, unproven but the market is patient.
> 
> The FastInfoset approach has been privately benchmarked and proven
> to be workable in much the same way as the cases given above.  Since
> faster performance is a customer requirement and not a theoretical
> issue, customers can go to the innovators who provide the necessary
> technology.
> 
> That would be, in this case, Sun.  They are of course, possibly
willing
> to license that technology to their partner in Redmond which has
> slower and late to market technology to assist them in coming to
market.
> 
> len
> 
> 
>
Received on Thursday, 7 April 2005 15:01:26 UTC