- From: olivier Thereaux <ot@w3.org>
- Date: Fri, 4 Apr 2008 16:47:04 -0400
- To: public-qa-dev hacking list <public-qa-dev@w3.org>
Hello, I spent a bit of time today playing with the Devel::NYTProf profile on the markup validator code, trying to find bottlenecks in the code. The Devel::NYTProf profiler really is a nice piece of software. Simple, efficient, and clear in its results, I am rather smitten. http://open.blogs.nytimes.com/2008/03/05/the-new-york-times-perl-profiler/ Running it on the Markup Validator % perl -T -d:NYTProf check uri=http://www.w3.org/TR/html5 % nytprofhtml % open profiler/index.html % open profiler/check.html ... showed that the most time-consuming parts were... * for small documents (e.g http://qa-dev.w3.org/ - Content-Length: 3345) the bottleneck seems to be HTML::Template. We do cache the templates but this is still the slowest (albeit reasonably) part of the process. * for much larger document (e.g the huge HTML5 spec - Content-Length: 2032139) the bottlenecks are more evenly distributed. For the html5 validated on my (old) computer: 46.42s HTML/Encoding.pm 05.19s HTTP/Message.pm 06.09s LWP/Protocol/http.pm 16.27s Encode.pm 36.22s Encode/Encoding.pm 42.15s check 154.9s total execution time (!) Interesting to see that the very time-consuming processes are not so much validation but encoding detection and decoding... That aside, looking at check I found something very surprising. There is one line responsible for 25 seconds of processing, and that is (current) line 2549: if ($self->{am_in_heading}==1){ ... in sub W3C::Validator::SAXHandler::data() Of course the line itself is not time-consuming, but its being called 1.2 million times (once per character) is really heavy. I'm wondering if it would be possible to make that one line faster. But if not, I think we need to reconsider the benefit of the "show outline" feature. That feature is the only reason why we have sub W3C::Validator::SAXHandler::data() at this point. Pros: * when the feature was broken, some people complained. * it is used for ~ 2% of the validation Cons: * 2% usage is not much * the future is not essential to validation * the metadata extractor is much more useful and powerful (although not necessarily more efficient, being xslt based) Any thought? Devel::NYTProf is installed on qa-dev, BTW, have fun with it. I'll look at checklink too early next week. -- olivier
Received on Friday, 4 April 2008 20:47:35 UTC