- From: olivier Thereaux <ot@w3.org>
- Date: Fri, 4 Apr 2008 16:47:04 -0400
- To: public-qa-dev hacking list <public-qa-dev@w3.org>
Hello,
I spent a bit of time today playing with the Devel::NYTProf profile on
the markup validator code, trying to find bottlenecks in the code.
The Devel::NYTProf profiler really is a nice piece of software.
Simple, efficient, and clear in its results, I am rather smitten.
http://open.blogs.nytimes.com/2008/03/05/the-new-york-times-perl-profiler/
Running it on the Markup Validator
% perl -T -d:NYTProf check uri=http://www.w3.org/TR/html5
% nytprofhtml
% open profiler/index.html
% open profiler/check.html
... showed that the most time-consuming parts were...
* for small documents (e.g http://qa-dev.w3.org/ - Content-Length:
3345) the bottleneck seems to be HTML::Template. We do cache the
templates but this is still the slowest (albeit reasonably) part of
the process.
* for much larger document (e.g the huge HTML5 spec - Content-Length:
2032139) the bottlenecks are more evenly distributed. For the html5
validated on my (old) computer:
46.42s HTML/Encoding.pm
05.19s HTTP/Message.pm
06.09s LWP/Protocol/http.pm
16.27s Encode.pm
36.22s Encode/Encoding.pm
42.15s check
154.9s total execution time (!)
Interesting to see that the very time-consuming processes are not so
much validation but encoding detection and decoding... That aside,
looking at check I found something very surprising. There is one line
responsible for 25 seconds of processing, and that is (current) line
2549:
if ($self->{am_in_heading}==1){
... in sub W3C::Validator::SAXHandler::data()
Of course the line itself is not time-consuming, but its being called
1.2 million times (once per character) is really heavy.
I'm wondering if it would be possible to make that one line faster.
But if not, I think we need to reconsider the benefit of the "show
outline" feature. That feature is the only reason why we have sub
W3C::Validator::SAXHandler::data() at this point.
Pros:
* when the feature was broken, some people complained.
* it is used for ~ 2% of the validation
Cons:
* 2% usage is not much
* the future is not essential to validation
* the metadata extractor is much more useful and powerful
(although not necessarily more efficient, being xslt based)
Any thought? Devel::NYTProf is installed on qa-dev, BTW, have fun with
it. I'll look at checklink too early next week.
--
olivier
Received on Friday, 4 April 2008 20:47:35 UTC