collectd: the system statistics collection daemon (collectd, stats, linux)
Since school ended, I have had lots of free time on my hands. Consequently, I have embarked on a systems administration extravaganza. There are several projects that I have been neglecting, some of which I have already written about.
Another of my spontaneous projects is improving the system monitoring and security aspects of my server. It has been a while since I did any sort of systems administration stuff, so I used del.icio.us to do some research. I found a bunch of cool applications, one of which is called collectd.
As the name indicates, collectd is a lightweight daemon that collects information on myriad system statistics and stores that information in RRD files. It polls every 10 seconds by default. I chose collectd over other monitoring solutions mainly because I have simple needs, so I do not need a fully-integrated monitoring solution with everything and the kitchen sink built in. Furthermore, with more complexity comes more surface area, and therefore, diminished system security.
There are two other selling points of collectd, both of which are design-related. The first is its modular plugin-based design. By default, collectd is essentially useless. The idea is that after installing the main daemon, you are free to incrementally add plugins until each one is working properly. In my experience, this differs from other system monitoring suites (ahem, Cacti, ahem) which are essentially useless by default, but have everything enabled.
I was surprised at how easy collectd was to configure. Every plugin I tried worked (with the exception of the hddtemp plugin). Some plugins, like cpu and swap, do not have any configuration. For those, a simple LoadPlugin cpu will suffice. Others have a small configuration section that is necessary for them to function. All collectd configuration is contained in the collectd.conf file, and plugins are documented at collectd.conf(5).
For example, take a sample apache entry:
LoadPlugin apache <Plugin apache> URL "http://localhost/server-status?auto" User "myun" Password "mypw" #CACert "/etc/ssl/ca.crt" </Plugin>
The second strong selling point of collectd is that it does one thing and does it well. All it really does is fork a bunch of light processes to update the RRD at regular intervals. This is good because it leaves the graphing implementation up to the user. I think this is a more flexible approach. It is not quite true that collectd does not have any graphing capabilities, however. The image on the left was produced by collection.cgi, a cgi basic yet functional script that is distributed with collectd. That should give some idea of collectd’s power.
As if the above weren’t enough, collectd can also run in unicast or multicast mode. This means that one collectd daemon can easily collect statistics from multiple computers with very little overhead. My next goal is to write a custom collectd plugin; how to do that is documented here.
I also wrote an init script for Archlinux:
#!/bin/bash
. /etc/rc.conf
. /etc/rc.d/functions
COLLECTD=/sbin/collectd
CONF=/etc/collectd.conf
PID=`pidof -o %PPID /sbin/collectd`
case "$1" in
start)
stat_busy "Starting collectd Daemon"
[ -z "$PID" ] && $COLLECTD -C $CONF
if [ $? -gt 0 ]; then
stat_fail
else
echo $PID > /var/run/collectd.pid
add_daemon collectd
stat_done
fi
;;
stop)
stat_busy "Stopping collectd Daemon"
[ ! -z "$PID" ] && kill $PID &> /dev/null
if [ $? -gt 0 ]; then
stat_fail
else
rm /var/run/collectd.pid
rm_daemon collectd
stat_done
fi
;;
restart)
$0 stop
sleep 10
$0 start
;;
*)
echo "usage: $0 {start|stop|restart}"
esac
exit 0
