Find which site 'spamd child' has high CPU usage in Ensim

posted by sacah on software,

The CPU utilisation report for one of our servers was higher than expected for months, I had some time to delve so I wrote the following script from info I found on the net, I don't fully understand the code, so if there is a better way of doing it let me know.
In the /etc/crontab I entered the below to run every minute
*       *       *       *       *       root /home/logCPU.sh 2> /dev/null

And in /home/logCPU.sh
#/bin/sh
echo '------------------------------ ' `date` ' -----------------------------' >> /home/logCPU.txt;
ps auxr --sort=-pcpu >> /home/logCPU.txt;
pids=$(ps auxr --sort=-pcpu | awk "/ $1/ { if(\$3>1) printf \"%d,\", \$2 }");
lsof -p `echo $pids` >> /home/logCPU.txt

Each minute where no CPU is utilised you'll see
------------------------------  Tue May 13 10:56:01 EST 2008  -----------------------------
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 462 0.0 0.0 0 0 ? D< May07 0:44 [kjournald]
root 26177 0.0 0.0 2164 824 ? R 10:56 0:00 ps auxr --sort=-pcpu
apache 26178 0.0 0.1 19596 4300 ? R 10:56 0:00 php -f a_script.php

When a process has a CPU usage it runs lsof, as the spamd child example below
------------------------------  Tue May 13 10:57:01 EST 2008  -----------------------------
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
22309 26314 1.8 0.9 39332 34916 ? R 10:56 0:00 spamd child
apache 17029 0.2 0.6 47196 24604 ? D 10:21 0:04 /usr/sbin/httpd
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
spamd 26314 22309 cwd DIR 9,0 4096 79103114 /home/virtual/site27/fst
spamd 26314 22309 rtd DIR 9,0 4096 79103114 /home/virtual/site27/fst
spamd 26314 22309 txt REG 9,1 14788 648405 /usr/bin/perl

This shows that it is site27 running the spamd process. You can let this run for a few days and see which sites are utilising too much CPU% because of SPAM, then see if you can optimise the site email and spam filtering, by maybe removing highly spammed accounts, turning off SPAM filtering if it can be easily picked up by the local mail client etc.

By removing 4 old email accounts from one particular site we reduced the server from a 24hr load average of 2 to about 0.6, the 5 minute average was hitting 8 at times, now it hits about 3 during the backup.

This script can also help you track down what is using CPU resources at a particular time.