- From: Ville Skyttä <ville.skytta@iki.fi>
- Date: Thu, 26 Jul 2007 21:15:05 +0300
- To: "QA-dev" <public-qa-dev@w3.org>
On Thursday 26 July 2007, you wrote: > I gather that most of the requests we get are for the > check script (and checklink and the feed validator - maybe we should > look at running the former under mod_perl2 too) As long as the same processes run validator and checklink, having them run checklink with mod_perl would possibly have some memory benefits indeed. And IIRC checklink should run fine with mod_perl. However, the "hits" it gets tend to last a long time, of which I suppose the bits where mod_perl could provide some performance gains are a tiny fraction. But it does reserve a precious mod_perl/httpd process for the whole duration of its run, and during that time the process is not available to serve validator requests (OTOH depending on Apache's process/thread model I suppose). In that sense, a more optimal setup could be to run checklink in a completely separate, thinner web server (possibly a non-mod_perl apache, or even something smaller such as lighttpd). > Uhm... MaxRequestsPerChild has always been a grey area to me. Right > now it's set to 1000 on our servers, which should yield benefits, and > be conservative if any of our stuff is leaking memory. It is useful also when there's no traditional leak, but when we're handling large in-memory data sets. For example, validating a document probably results in the process requiring N times the size of the document for the document data alone ($File->{Bytes}, $File->{Content}, intermediate $output result inside transcode() [0]). Add one if "show source" is on because we grab the complete output of $template->output() into a variable and only then print it out afterwards. And probably one more for the Encode::encode()d version of the template output [1]. And one more when stringifying $File->{Content} for XML::LibXML. And there may be even more, I haven't looked into what happens inside HTTP::Response->decode_content() nor HTML::Encoding. Even though some of these copies go out of scope as we proceed and thus I suppose their memory can be reused on the fly, for multi-megabyte docs this starts to hurt pretty quickly. The memory usage for that particular process will balloon into whatever it needs, and will stay at least that large for its whole lifetime. MaxRequestsPerChild shortens that. 1000 sounds quite large to me offhand, but it's probably just fine at least as a starting point. Another thing that helps with this stuff is the Apache2::SizeLimit module which can be used to terminate a process right after some aspect of its memory usage has grown into something uncomfortable. [0] I think $File->{Bytes} is no longer needed after transcode(), so maybe the intermediate $output copy inside could be eliminated by transcoding $File->{Bytes} in place using Encode::from_to() instead of using encode(). [1] HTML::Template can output directly into a filehandle. Decorating STDOUT with Encode's PerlIO stuff and then outputting directly into it using $template->output(print_to => ...) could reduce some of those copies. But there at least used to be various problems with directly dealing with STDOUT in some ways in mod_perl environments, I don't know if that's still the case.
Received on Thursday, 26 July 2007 18:15:21 UTC