Re: Question about baseline textual parser performance (wrt binary ones) from Tatu Saloranta on 2006-10-13 (public-exi@w3.org from October 2006)

From: Tatu Saloranta <tsaloranta@gmail.com>
Date: Fri, 13 Oct 2006 11:02:37 -0700
To: public-exi@w3.org
Message-ID: <5f7770580610131102v62772617n692274afd507d803@mail.gmail.com>

On 10/13/06, Vogelheim, Daniel <daniel.vogelheim@siemens.com> wrote:
> Hello Tatu,
>
> Many thanks for taking an interest in the work of the EXI working group.

Thank you for working on the EXI publication! Measuring performance
takes lots of work, but is essential for rational decisions on
choosing right components.

...
> Our measurements include both the JAXP parser (i.e. the standard JDK
> parser, whatever that happens to be) and an optimized parser. The

Ok. So that would be Xerces (2.7.x for JDK 5). This is good, and what
I was looking for.

...
> respective best of breed. A slightly different way of putting it is: A
> performance oriented developer would almost certainly use an optimized
> implementation before considering a change in the underlying format. Our

Makes sense -- and I think it is a very important question to answer;
not having reliable data has already caused some confusion regarding
specific value of proposed binary xml implementations.

...
> The actual performance differential varies greatly between test
> documents and use cases.

Definitely, I understand this. In general, I have paid closest
attention to median (for the particular set of docs I have used,
amounting to about 60), as well as correlation to size. But latter is
more relevant wrt. implementations -- some have significantly higher
per-document overhead than others.

...
> When browsing over the individual test cases I regularly see a factor of
> 1.5x, and there are a number of test cases where we see a factor of 2x.
> (Over real world test data, representing various use cases.)

Is this between default JAXP parser and the fastest textual one (from
Fujitsu) you have access to?

This would make sense, given performance characteristics I have seen.
My gut feeling is that highest achievable implementation-specific
improvement (on java platform, textual xml, medium-to-large sized,
simpleish documents [not very attribute or namespace heavy]) is
somewhere close to 2x (compared to jaxp-default baseline). I can be
wrong there, of course, just seems to go that way. ;-)

...
> I suspect we won't be incorporating additional parsers into the test
> suite at this stage, but I would hope that by including JAXP parsing we
> already meet your requirement.

Actually this does cover my needs well: I was specifically thinking of
Xerces, which happens be the default JAXP implementation. The only
caveat is that JDK tends to use somewhat outdated versions, but since
performance changes to Xerces are quite gradual, this probably does
not have huge effect.

> Tatu, I hope this helps. Please let me (and the list) know if you have
> additional questions.

It does answer my question. Thank you again for taking your time to
explain the approach,

-+ Tatu +-

Received on Friday, 13 October 2006 18:02:49 UTC