RE: [xml-dev] Call for fast XML implementations

Hi Jeff,


The use cases EXI is working from are those defined by XBC and available at http://www.w3.org/TR/xbc-use-cases/. I'm not sure if this is precisely what you are referring to, but it may help set some context as to the various environments, etc. in which we expect these parsers to be used.

The WG is aware of the issues in using an API such as SAX and is attempting to address this in our testing and analysis. We would work to fit any fast XML implementations into our framework to achieve the fairest comparisons.

To date, yes, our testing is focused primarily, but not exclusively, on UTF-8 and UTF-16. This is a direct reflection of the actual encodings used by the test data we've collected.

regards,
Oliver Goldman, Co-chair, EXI Working Group

--
Oliver Goldman
Adobe Systems Incorporated
345 Park Avenue, MS E6-428
San Jose, CA 95110-2704 USA
408.536.2010 p
ogoldman@adobe.com 



> -----Original Message-----
> From: public-exi-request@w3.org 
> [mailto:public-exi-request@w3.org] On Behalf Of Jeff Rafter
> Sent: Saturday, March 25, 2006 10:35 PM
> To: Robin Berjon
> Cc: XML Developers List; W3C EXI Public
> Subject: Re: [xml-dev] Call for fast XML implementations
> 
> 
> I think we need a bit more in the area of use cases. XML 
> Parsers can clearly be tweaked within the realm of 
> conformance, without being incredibly useful. For instance, 
> if you are just looking for a well-formedness test you can 
> get a pretty fast parser. If you are looking for it to report 
> anything back to you that will slow it down. 
> Reporting methodology is important. Under SAX there is a 
> large degree of work that must be done before the information 
> is passed to the
> application-- oftentimes this work is conjoined into the 
> lexer. If there was no anticipation of utilizing various 
> features then the reporting facilities could be removed. 
> Additionally, *fast* is clearly relative. 
> If you are only parsing XML-TS files then in general any 
> parser that has name caching or name hashes will perform more 
> slowly than if they were parsing a 12 gig database to XML 
> dump of repeating record and field names. Likewise, XML 
> parsers that maintain useful error information (e.g. 
> position) will be slower than less-informative models.
> 
> Of course, this synopsis is less than empirical... more 
> results from parsers I have tested in the trenches.
> 
> As an aside, are your tests focused only on UTF-8/16?
> 
> Cheers,
> Jeff Rafter
> 
> Robin Berjon wrote:
> > Dear all,
> > 
> > it's an interesting coincidence that at the same time that 
> the EXI WG 
> > was busy drafting this very email xml-dev was discussing faster XML 
> > parsers. We're sorry to say that we don't have money to 
> offer, though 
> > we can offer warm fuzzies and the chance to get your implementation 
> > before the eyes of XML-related decision-makers (or at least 
> > influencers) from quite a fair number of companies.
> > 
> > As you may know, the Efficient XML Interchange WG has been busy 
> > building up a framework for the purpose of measuring 
> various aspects 
> > of efficient XML formats, with the goal not only of comparing 
> > efficient formats with one another but also of comparing 
> them to XML 
> > in order to demonstrate the need for such a format with the 
> > nitty-gritty of detail that has been requested of us. It is 
> notably of 
> > interest because these measurements will constitute a 
> go/no-go point 
> > such that if the efficient formats are not efficient 
> enough, the EXI 
> > WG would be closed down without producing an efficient XML format.
> > 
> > Needless to say, performing this comparison against a set 
> of sluggish 
> > XML parsers out there, of which there is no shortage, would hardly 
> > prove satisfying. We therefore plan to use the fastest XML parsers 
> > that we can lay our hands on, and this is where you can 
> help us. While 
> > the WG does have quite a fair bit of experience looking for 
> faster XML 
> > parsers, we do not claim to have perfect, all-encompassing 
> knowledge 
> > about the options that may be available and that we might have 
> > overlooked in the past few years. Furthermore, a 
> non-negligible number 
> > of the XML parsers that are pitched as faster than the rest achieve 
> > these levels of performance by cutting corners, generally 
> making them 
> > non-conformant — something which we cannot consider to be a 
> valid approach.
> > 
> > Therefore we solicit input from the community regarding fast XML 
> > parsers. Which one(s) would you pick if you had to tear through XML 
> > documents at warp speed? What is your experience with them 
> in terms of 
> > conformance? What would be your best bet if you wanted to 
> kill the EXI 
> > effort in its tracks?
> > 
> > We will naturally accept any and all information that we 
> get our hands 
> > on, but in order to be able to make the best use of the information 
> > and possibly to avoid being swamped, there are some aspects that we 
> > would like to see alongside parser recommendations:
> > 
> >  * Some form of conformance statement. It needs to pass the 
> XML Test 
> > Suite (http://www.w3.org/XML/Test/).
> >  * We need to be able to actually measure it. This entails 
> that if the 
> > code is not publicly available, we'll need a way to work out how we 
> > can get a copy of the code to run it in our test system. 
> This doesn't 
> > necessarily mean making a copy of it available to all 
> members of the 
> > WG, but at least to the W3C staff so that they can run the tests.
> >  * If you wish to be extra helpful, you can also include the small 
> > amount of code and configuration required to run the parser 
> within our 
> > framework, which is built on top of Japex 
> (https://japex.dev.java.net/).
> > If you're interested, we'd be delighted to help you get started.
> > 
> > Thank you very much in advance, we look forward to your input.
> > 
> > --Robin Berjon, on behalf of the EXI WG
> >    Senior Research Scientist
> >    Expway, http://expway.com/
> > 
> > 
> > 
> > -----------------------------------------------------------------
> > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an 
> > initiative of OASIS <http://www.oasis-open.org>
> > 
> > The list archives are at http://lists.xml.org/archives/xml-dev/
> > 
> > To subscribe or unsubscribe from this list use the subscription
> > manager: <http://www.oasis-open.org/mlmanage/index.php>
> > 
> > 
> 
> 
> 

Received on Wednesday, 5 April 2006 09:16:08 UTC