Re: mod_perl memory usage

* olivier Thereaux <ot@w3.org> [2007-08-01 18:35+0900]
>Hi all,
>
>Just FYI, we have switch mod_perl2 off on all validator servers again.
>Running under mod-perl seemed like a good idea, but there were some  
>issues we're having trouble explaining, like how the load remained  
>really, really high on the machines (apache2 processes using up a lot  
>of CPU) even when very few requests were being received.
>
>Switcing off mod_perl means more resource forks, and a slightly  
>slower validation process, but ultimately, less load and less wait,  
>it seems. Go figure.

mod_perl may have been a red herring; I was blaming it for the 
massive apache2 process sizes (200 MB+ on jesssica) because I 
thought that was something you guys had changed recently (though 
I don't even know if that's true) and I had never seen apache 
processes that big.

Also, 'check' is so expensive that I didn't really expect 
mod_perl to be a huge win; the bit of extra work to fire up a 
perl interpreter must be relatively cheap.

But even after we pruned that and other stuff last night the 
apache2 process sizes are 120 MB, and the 'check' processes are 
80-90 MB, so maybe we would be OK with mod_perl after all.

>I still want to try some of the performance tweaks you suggested,  
>Ville [1] (avoiding copying content, undef-ing after use, etc).  
>Gerald also was suggesting looking at e.g BSD::Resource or any ulimit- 
>like system, to avoid having some "check" processes spin away and hog  
>CPU. Worth a shot.

I think resource limits would help a lot. After our changes last 
night all the validator servers all seem pretty happy:

    25 requests currently being processed, 103 idle workers -- jessica
    18 requests currently being processed, 12 idle workers -- fugu
    16 requests currently being processed, 10 idle workers -- lovejoy

The biggest problem now seems to be a few URIs that consistently 
eat up many minutes of CPU time; currently on jessica there are 
several processes that have consumed 20+ cpu minutes each:

      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    31664 www-data  25   0 88632  36m  10m R   29  0.9  22:14.48 check
    32073 www-data  25   0 97820  36m  10m R   29  0.9  21:09.62 check
    31862 www-data  25   0  131m  72m  10m R   27  1.8  21:26.18 check
    31732 www-data  25   0 97824  36m  10m R   24  0.9  22:10.81 check

It might be useful if 'check' told us what it was doing by adding 
lines like this throughout the code:

    $0 = "check: fetching $uri";
...
    $0 = "check: sgml::parsing $uri";
...
    $0 = "check: returning results for $uri";

so we can see what each process is up to in the output of 'ps'

In the meantime you can see which URIs are responsible for these 
long-running check processes with:

    cat /proc/31664/environ | tr '\000' '\012' | grep QUERY_STRING

(I would paste a few samples here but I think that would violate 
our privacy policy)

>[1] http://lists.w3.org/Archives/Public/public-qa-dev/2007Jul/0022.html

-- 
Gerald Oskoboiny     http://www.w3.org/People/Gerald/
World Wide Web Consortium (W3C)    http://www.w3.org/
tel:+1-604-906-1232             mailto:gerald@w3.org

Received on Wednesday, 1 August 2007 23:46:08 UTC