openmind ☃   January 04, 2008  ☃  collectd: the system statistics collection daemon  (, , )

collection.cgi, Version 2_1199457689599

Since school ended, I have had lots of free time on my hands. Consequently, I have embarked on a systems administration extravaganza. There are several projects that I have been neglecting, some of which I have already written about.

Another of my spontaneous projects is improving the system monitoring and security aspects of my server. It has been a while since I did any sort of systems administration stuff, so I used del.icio.us to do some research. I found a bunch of cool applications, one of which is called collectd.

As the name indicates, collectd is a lightweight daemon that collects information on myriad system statistics and stores that information in RRD files. It polls every 10 seconds by default. I chose collectd over other monitoring solutions mainly because I have simple needs, so I do not need a fully-integrated monitoring solution with everything and the kitchen sink built in. Furthermore, with more complexity comes more surface area, and therefore, diminished system security.

There are two other selling points of collectd, both of which are design-related. The first is its modular plugin-based design. By default, collectd is essentially useless. The idea is that after installing the main daemon, you are free to incrementally add plugins until each one is working properly. In my experience, this differs from other system monitoring suites (ahem, Cacti, ahem) which are essentially useless by default, but have everything enabled.

I was surprised at how easy collectd was to configure. Every plugin I tried worked (with the exception of the hddtemp plugin). Some plugins, like cpu and swap, do not have any configuration. For those, a simple LoadPlugin cpu will suffice. Others have a small configuration section that is necessary for them to function. All collectd configuration is contained in the collectd.conf file, and plugins are documented at collectd.conf(5).

For example, take a sample apache entry:

LoadPlugin apache
<Plugin apache>
	URL "http://localhost/server-status?auto"
	User "myun"
	Password "mypw"
	#CACert "/etc/ssl/ca.crt"

</Plugin>

The second strong selling point of collectd is that it does one thing and does it well. All it really does is fork a bunch of light processes to update the RRD at regular intervals. This is good because it leaves the graphing implementation up to the user. I think this is a more flexible approach. It is not quite true that collectd does not have any graphing capabilities, however. The image on the left was produced by collection.cgi, a cgi basic yet functional script that is distributed with collectd. That should give some idea of collectd’s power.

As if the above weren’t enough, collectd can also run in unicast or multicast mode. This means that one collectd daemon can easily collect statistics from multiple computers with very little overhead. My next goal is to write a custom collectd plugin; how to do that is documented here.

I also wrote an init script for Archlinux:

#!/bin/bash

. /etc/rc.conf
. /etc/rc.d/functions

COLLECTD=/sbin/collectd
CONF=/etc/collectd.conf
PID=`pidof -o %PPID /sbin/collectd`

case "$1" in
  start)
    stat_busy "Starting collectd Daemon"
    [ -z "$PID" ] && $COLLECTD -C $CONF
    if [ $? -gt 0 ]; then
      stat_fail
    else
      echo $PID > /var/run/collectd.pid
      add_daemon collectd
      stat_done
    fi
    ;;
  stop)
    stat_busy "Stopping collectd Daemon"
    [ ! -z "$PID" ]  && kill $PID &> /dev/null
    if [ $? -gt 0 ]; then
      stat_fail
    else
      rm /var/run/collectd.pid
      rm_daemon collectd
      stat_done
    fi
    ;;
  restart)
    $0 stop
    sleep 10
    $0 start
    ;;
  *)
    echo "usage: $0 {start|stop|restart}"
esac
exit 0

blog comments powered by Disqus