Re: mod_perl memory usage from Ville Skyttä on 2007-07-26 (public-qa-dev@w3.org from July 2007)

From: Ville Skyttä <ville.skytta@iki.fi>
Date: Thu, 26 Jul 2007 21:15:05 +0300
To: "QA-dev" <public-qa-dev@w3.org>
Message-Id: <200707262115.06503.ville.skytta@iki.fi>
On Thursday 26 July 2007, you wrote:

> I gather that most of the requests we get are for the
> check script (and checklink and the feed validator - maybe we should
> look at running the former under mod_perl2 too)

As long as the same processes run validator and checklink, having them run 
checklink with mod_perl would possibly have some memory benefits indeed.  And 
IIRC checklink should run fine with mod_perl.

However, the "hits" it gets tend to last a long time, of which I suppose the 
bits where mod_perl could provide some performance gains are a tiny fraction.  
But it does reserve a precious mod_perl/httpd process for the whole duration 
of its run, and during that time the process is not available to serve 
validator requests (OTOH depending on Apache's process/thread model I 
suppose).  In that sense, a more optimal setup could be to run checklink in a 
completely separate, thinner web server (possibly a non-mod_perl apache, or 
even something smaller such as lighttpd).

> Uhm... MaxRequestsPerChild has always been a grey area to me. Right
> now it's set to 1000 on our servers, which should yield benefits, and
> be conservative if any of our stuff is leaking memory.

It is useful also when there's no traditional leak, but when we're handling 
large in-memory data sets.

For example, validating a document probably results in the process requiring N 
times the size of the document for the document data alone ($File->{Bytes}, 
$File->{Content}, intermediate $output result inside transcode() [0]).  Add 
one if "show source" is on because we grab the complete output of 
$template->output() into a variable and only then print it out afterwards.  
And probably one more for the Encode::encode()d version of the template 
output [1].  And one more when stringifying $File->{Content} for XML::LibXML.

And there may be even more, I haven't looked into what happens inside 
HTTP::Response->decode_content() nor HTML::Encoding.  Even though some of 
these copies go out of scope as we proceed and thus I suppose their memory 
can be reused on the fly, for multi-megabyte docs this starts to hurt pretty 
quickly.

The memory usage for that particular process will balloon into whatever it 
needs, and will stay at least that large for its whole lifetime.  
MaxRequestsPerChild shortens that.  1000 sounds quite large to me offhand, 
but it's probably just fine at least as a starting point.

Another thing that helps with this stuff is the Apache2::SizeLimit module 
which can be used to terminate a process right after some aspect of its 
memory usage has grown into something uncomfortable.


[0] I think $File->{Bytes} is no longer needed after transcode(), so maybe the 
intermediate $output copy inside could be eliminated by transcoding 
$File->{Bytes} in place using Encode::from_to() instead of using encode().

[1] HTML::Template can output directly into a filehandle.  Decorating STDOUT 
with Encode's PerlIO stuff and then outputting directly into it using 
$template->output(print_to => ...) could reduce some of those copies.  But 
there at least used to be various problems with directly dealing with STDOUT 
in some ways in mod_perl environments, I don't know if that's still the case.
Received on Thursday, 26 July 2007 18:15:21 UTC