Munin's CPU usage
As seen before, munin is run by a cron job every five minutes. So, every five minutes, it connects to all the servers it has to monitor, fetches all the data, writes the data in hundreds of RRD files, and recreates all the HTML files and hundreds of PNG files; the more servers monitored, the more CPU munin will use.
Some other tests are also rather interesting:
real 0m27.453s
user 0m0.152s
sys 0m0.036s
yann@yann-laptop:/usr/share/munin$ time sudo -u munin ./munin-limits
real 0m0.179s
user 0m0.132s
sys 0m0.016s
yann@yann-laptop:/usr/share/munin$ time sudo -u munin ./munin-html
real 0m0.270s
user 0m0.176s
sys 0m0.020s
yann@yann-laptop:/usr/share/munin$ time sudo -u munin ./munin-graph
real 0m11.376s
user 0m10.465s
sys 0m0.500s
This test (made on my laptop, one node monitored only) shows two interesting things: first, the generation of the PNGs is the heaviest part of the process (10.965 seconds of cpu usage vs 0.532 for the three other processes); second, the munin-update process takes nearly 30 seconds to complete, but barely uses the CPU - probably because it is waiting for the node to run all its plugins. That's why when munin starts, it forks, and run a process for each node, and why you should not prevent it from forking (there is an option for that - don't use it).
If now I was monitoring 10 nodes, it would take approx. 110 seconds on my laptop (if nothing else is running), every five minutes. In other words: as you add nodes to munin, it tends to become quite heavy.
Run munin as a CGI
One of the ways to improve the performances is to change the way Munin creates the graphs; instead of recreating the graphs every five minutes, we can create them only when a user has requested them, by displaying one of the webpages. This is made possible with CGI.
So, how does it work? When installed, Munin creates a script in /usr/lib/cgi-bin/, munin-cgi-graph. When configured as CGI, Munin changes the links to the pictures in the HTML files, making them point to munin-cgi-graph:
Depending on the path, munin-cgi-graph will create the appropriate graph, which will then be displayed. There is also a caching system, so that if you reload the page within five minutes, the graphs won't be regenerated again; therefore, as munin will write the files to the disk, the directory /var/www/munin must be writeable by the apache process. Making the files belong to the user munin and the group www-data, and giving the group write access, is one solution:
yann@yann-laptop:/var/www$ sudo chmod -R g+w /var/www/munin
The performance gain is huge; but one of the drawbacks to this method is that it takes a lot more time to display a page containing several graphs, like the node view.
To configure Munin as CGI you need to add the following lines to your /etc/munin/munin.conf:
cgiurl /cgi-bin
cgiurl_graph /cgi-bin/munin-cgi-graph
These lines help Munin to create correct links to the graphs. Now, assuming you are using Apache, you need to edit your main apache configuration file, to allow /usr/lib/cgi-bin to run CGI scripts:
AllowOverride None
Options ExecCGI -MultiViews +SymLinksIfOwnerMatch
Order allow,deny
Allow from all
</Directory>
Finally, you need to tell Apache that your website is going to use CGI. If you have a special virtual host set up for munin, then add that line there; else add it somewhere in the main apache configuration file:
Munin-cgi-graph also uses the perl module Date::Manip; which you need to install. Your Munin is now running as CGI!
Move Munin's RRD databases to a TMPFS
On an install of approximatively 30 servers monitored, I have over 2000 RRD databases in /var/lib/munin. This number can vary depending on the number of services you monitor per server, but what we can remember is: every time munin runs (every 5 minutes), hundreds if not thousands of databases are written to and read from. If your disks aren't very fast, this can prove quite costly as the number of servers monitored grows.
This can be improved by moving the files contained in /var/lib/munin to a tmpfs. In my example, with 30 servers monitored and 2250 RRD files, only 115MB are used on the disk - considering the amount of RAM in servers nowadays, it may be worth saving some disk i/o at the cost of some RAM. As all the data would be lost in case of a server restart, we will back the data up every hour/day/week depending on how much data you are willing to lose.
Make a backup of the folder:
cp -ra munin/ munin-cache
Add this to your /etc/fstab:
Mount it:
sudo mount /var/lib/munin
Copy the data back from the backup:
Create an hourly (or daily) cronjob that copies the files from munin to munin-cache:
ServerX $ ls -l
total 4
-rwxr-xr-x 1 root root 57 2009-03-10 18:55 munin-cache
ServerX $ cat munin-cache
#!/bin/sh
cp -ra /var/lib/munin/* /var/lib/munin-cache/
And then to restore the files from the backup automatically after a reboot, add this at the end of /etc/rc.local: