- From: olivier Thereaux <ot@zoy.org>
- Date: Wed, 2 Jan 2008 11:55:40 +0900
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: spo-devel@lists.sf.net, Tools dev list <public-qa-dev@w3.org>
Hi Bjoern, You wrote: > Investigating them, I had a brief look at the current `check` code. As > I understand it, the current SGML::Parser::OpenSP handler always has a > start_element handler (and others aswell) declared. This makes the > code > extremely slow, if you make a handler > > sub start_element { > require Data::Dumper; > print Data::Dumper::Dumper(\@_); > } > > it should immediately become clear why, it spends allmost all its time > creating huge data structures and converting strings from UTF-32 as > they are provided by OpenSP into UTF-8 encoded Perl strings. Overall > it should be even slower than calling `onsgmls` and going with regular > expressions over the output as `check` did before. I see, thanks a lot for pointing it out. I guess one could say that with all the preparsing and other features, trhe markup validator has long compromised performance for user- friendliness. But more to the point - at the moment SPO's start_element handler is used for 1) the outline feature and 2) to check xmlns presence and value in a number of document types. For the latter, I guess we could move that code over to a handler of the XML parser: so far we're using XML::LibXML for XML-well-formedness and I've looked into using the SAX version of that module instead, to plug the Appendix C checker into. (without success yet though - XML::LibXML::SAX::Parser remains elusively ill-documented...) But I suppose that would be tantamount to moving the performance issue to another module... For the former, would you suggest to use different SPO handlers, one without start_element() and one with, depending on the options and needs? > If performance is still some sort of concern, I would recommend to > pass a handler that has no start_element callback defined unless you > really have to. Performance always an issue as we're having tons of traffic, but our recent server upgrades and indeed the move to SPO (even with start_element handler, even with three parsing rounds for some documents - preparse, xml-wf and validation proper) have made the situation very bearable for now... Thanks, -- olivier
Received on Wednesday, 2 January 2008 02:55:53 UTC