Scaling-out WordPress – Performance Measures – Using Graphite

graphite overview - short image

Results overview

The dashboard above shows what you can do with graphite

At one glance, you have all the indicators you need in real time (I have an auto refresh every 10sec)

In this post, I will show you how to get it by installing and configuring collectd, graphite.
Then we will write a custom performance collector in python to check the response time.

Graphite is a wonderful tool to store and display all kind of time series measures.
By default, Graphite is able to graph mesures stored in rddtool files (round robin database) or in its whisper database (see the graphite architecture).
With graphite, it is very easy to develop you own collector but as compatible existing tools such as collectd exist (tools that can use rddtool database to store the values or that can send the values to graphite via the carbon component of Graphite), I decided to use collectd.

Installing and configuring collectd

I just followed the instructions http://collectd.org/wiki/index.php/First_steps
I created an ubuntu 12.04 x64 server fresh VM and named the server : tu-mon-01.

Collectd Basic Configuration

Then I used the following instructions:

sudo apt-get install collectd
#the following commands are needed to install the web app (collection3) that can display the time series stored in the rddtool files
sudo apt-get install apache2

sudo apt-get install librrds-perl libconfig-general-perl libhtml-parser-perl  libregexp-common-perl
cd /usr/share/doc/collectd/examples
sudo cp -r collection3 /var/www
sudo chown -R www-data:www-data collection3/

sudo vi /etc/apache2/sites-available/default
	#AllowOverride None
        AllowOverride All
sudo /etc/init.d/apache2 restart

collectd is made up of a set of plugins (to collect or store performance counters) that you can activate as you want. The default installation comes with a ready-to-use set of plugins.

As I installed the collection3 webapp, I could see right away some graphs by accessing the webpage http://tu-mon-01/collection3/bin/index.cgi on my tu-mon-01 server. This tool is basic but does the job.
Rem:
I tested other tools such as drraw and visage but as I planed to use Graphite, I didn’t take a lot of time to evaluate them.
I liked visage, simple and excellent graphic capabilities.

Collectd Cient Server Configuration

Then I setup the client-server collectd configuration because I just wanted to collect values on the 4 fronts, the mysql servers and the load balancers and to store the values on the monitoring server tu-mon-01 (IP 192.168.100.211)

on tu-web-01, tu-web-02, tu-web-03, tu-web-04, tu-sql-01, tu-lb-01 :

$ sudo apt-get install collectd
$ sudo vi /etc/collectd/collectd.conf
LoadPlugin network
#LoadPlugin rrdtool
<Plugin network>
    # client setup:
    Server "192.168.100.211"
</Plugin>
$ sudo /etc/init.d/collectd restart

on tu-mon-01 :

$ sudo vi /etc/collectd/collectd.conf
LoadPlugin network
<Plugin network>
         Listen "192.168.100.211"
</Plugin>
$ sudo /etc/init.d/collectd restart

Cpu Usage Aggregation

One thing that I find very annoying with collectd is that the standard cpu plugin does not sum the % usage of each core.
Even if you can sum with graphite (I will show it to sum the system and the user percentage usage), I prefer to get the total percentage in the stored values.
There is a plugin aggregation here but it is still in development and I didn’t find any source.
After some google searchs, I found a patch here that does the job. So I followed the instructions to build the debian package and installed the new package on all the servers

To build the package on tu_mon-01:

mkdir collectd
cd collectd
sudo apt-get install devscripts build-essential fakeroot
sudo apt-get source collectd
sudo apt-get build-dep collectd
wget http://www.varnernet.com/~bryan/wp-content/uploads/cpuagg.patch_.tar.gz
tar -xzf cpuagg.patch_.tar.gz
cd collectd-4.10.1
patch -p0 < ../cpuagg.patch
debuild -us -uc
cd ..
sudo dpkg -i collectd-core_4.10.1-2.1ubuntu7_amd64.deb

To install the package on tu-web-01

scp tu-mon-01:~/collectd/collectd-core_4.10.1-2.1ubuntu7_amd64.deb .
sudo dpkg -i collectd-core_4.10.1-2.1ubuntu7_amd64.deb

Mysql plugin

On the mysql server, I activated the plugin as followed:

$ sudo vi /etc/collectd/collectd.conf
LoadPlugin mysql
<Plugin mysql>
  <Database mysite>
    Host "localhost"
    Port 3306
    User "root"
    Password "[PASSWORD]"
    Database "mysite"
    MasterStats true
  </Database>
</Plugin>

Installing and configuring Graphite

I found a very good presentation here http://www.aosabook.org/en/graphite.html and nice installation instructions here http://geek.michaelgrace.org/2011/09/how-to-install-graphite-on-ubuntu/

Installing Graphite

I just adapted a little the instructions to download the last Graphite version.

mkdir graphite
cd graphite/
wget https://launchpad.net/graphite/0.9/0.9.10/+download/graphite-web-0.9.10.tar.gz
wget https://launchpad.net/graphite/0.9/0.9.10/+download/carbon-0.9.10.tar.gz
wget https://launchpad.net/graphite/0.9/0.9.10/+download/whisper-0.9.10.tar.gz
tar -xvf graphite-web-0.9.10.tar.gz
tar -xvf carbon-0.9.10.tar.gz
tar -xvf whisper-0.9.10.tar.gz
mv graphite-web-0.9.10 graphite
mv carbon-0.9.10 carbon
mv whisper-0.9.10 whisper
rm graphite-web-0.9.10.tar.gz
rm carbon-0.9.10.tar.gz
rm whisper-0.9.10.tar.gz
sudo apt-get install --assume-yes apache2 apache2-mpm-worker apache2-utils apache2.2-bin apache2.2-common libapr1 libaprutil1 libaprutil1-dbd-sqlite3 python3.2 libpython3.2 python3.2-minimal libapache2-mod-wsgi libaprutil1-ldap memcached python-cairo-dev python-django python-ldap python-memcache python-pysqlite2 sqlite3 erlang-os-mon erlang-snmp rabbitmq-server bzr expect ssh libapache2-mod-python python-setuptools
sudo easy_install django-tagging
cd whisper/
sudo python setup.py install
cd ../carbon/
sudo python setup.py install
cd /opt/graphite/conf/
sudo cp carbon.conf.example carbon.conf
sudo cp storage-schemas.conf.example storage-schemas.conf
sudo vi storage-schemas.conf
cd
cd graphite/graphite/
sudo python check-dependencies.py
sudo python setup.py install
sudo cp /etc/apache2/sites-available/default /etc/apache2/sites-available/default.orig
cd examples/
sudo cp example-graphite-vhost.conf /etc/apache2/sites-available/default
sudo cp /opt/graphite/conf/graphite.wsgi.example /opt/graphite/conf/graphite.wsgi
sudo vim /etc/apache2/sites-available/default
  WSGISocketPrefix run/wsgi
  <VirtualHost *:80>
    ServerName graphite
    DocumentRoot "/opt/graphite/webapp"
    ErrorLog /opt/graphite/storage/log/webapp/error.log
    CustomLog /opt/graphite/storage/log/webapp/access.log common
    # I've found that an equal number of processes & threads tends
    # to show the best performance for Graphite (ymmv).
    WSGIDaemonProcess graphite processes=5 threads=5 display-name='%{GROUP}' inactivity-timeout=120
    WSGIProcessGroup graphite
    WSGIApplicationGroup %{GLOBAL}
    WSGIImportScript /opt/graphite/conf/graphite.wsgi process-group=graphite application-group=%{GLOBAL}
    # XXX You will need to create this file! There is a graphite.wsgi.example
    # file in this directory that you can safely use, just copy it to graphite.wgsi
    WSGIScriptAlias / /opt/graphite/conf/graphite.wsgi
    Alias /content/ /opt/graphite/webapp/content/
    <Location "/content/">
      SetHandler None
    </Location>
    # XXX In order for the django admin site media to work you
    # must change @DJANGO_ROOT@ to be the path to your django
    # installation, which is probably something like:
    # /usr/lib/python2.6/site-packages/django
    Alias /media/ "@DJANGO_ROOT@/contrib/admin/media/"
    <Location "/media/">
      SetHandler None
    </Location>
    # The graphite.wsgi file has to be accessible by apache. It won't
    # be visible to clients because of the DocumentRoot though.
    <Directory /opt/graphite/conf/>
      Order deny,allow
      Allow from all
    </Directory>
  </VirtualHost>
sudo mkdir /etc/httpd
sudo mkdir /etc/httpd/wsgi
sudo /etc/init.d/apache2 reload
cd /opt/graphite/webapp/graphite/
sudo python manage.py syncdb
sudo chown -R www-data:www-data /opt/graphite/storage/
sudo /etc/init.d/apache2 restart
cd /opt/graphite/webapp/graphite
sudo cp local_settings.py.example local_settings.py
cd /opt/graphite/
sudo ./bin/carbon-cache.py start
#to collect some values
cd ~/graphite/graphite/examples
sudo chmod +x example-client.py
sudo ./example-client.py

Set the location of the collectd rrdtool files in Graphite

Then I followed the instructions here http://graphite.readthedocs.org/en/latest/tools.html to set the location of the collectd rrdtool files in Graphite

$ sudo ln -s /var/lib/collectd/rrd/ADE-W08-02 /opt/graphite/storage/rrd/ADE-W08-02
$ sudo ln -s /var/lib/collectd/rrd/tu-mon-01 /opt/graphite/storage/rrd/tu-mon-01
$ sudo ln -s /var/lib/collectd/rrd/tu-sql-01 /opt/graphite/storage/rrd/tu-sql-01
$ sudo ln -s /var/lib/collectd/rrd/tu-web-01 /opt/graphite/storage/rrd/tu-web-01
$ sudo ln -s /var/lib/collectd/rrd/tu-web-02 /opt/graphite/storage/rrd/tu-web-02
$ sudo ln -s /var/lib/collectd/rrd/tu-web-03 /opt/graphite/storage/rrd/tu-web-03
$ sudo ln -s /var/lib/collectd/rrd/tu-web-04 /opt/graphite/storage/rrd/tu-web-04
$ sudo ln -s /var/lib/collectd/rrd/tu-lb-01 /opt/graphite/storage/rrd/tu-lb-01

$ cd /opt/graphite/webapp/graphite
$ sudo vi local_settings.py
TIME_ZONE = 'Europe/Paris'
#for error with log
LOG_RENDERING_PERFORMANCE = True
LOG_CACHE_PERFORMANCE = True
LOG_METRIC_ACCESS = True

WHISPER_DIR = '/opt/graphite/storage/whisper'
RRD_DIR = '/opt/graphite/storage/rrd'
DATA_DIRS = [WHISPER_DIR, RRD_DIR] # Default: set from the above variables
LOG_DIR = '/opt/graphite/storage/log/webapp'
INDEX_FILE = '/opt/graphite/storage/index'  # Search index file
$ sudo apt-get install python-rrdtool

Changing the default refresh interval to 10 sec

Instructions from http://blog.stuartherbert.com/php/2011/09/21/real-time-graphing-with-graphite/

$ sudo vi local_settings.py
DEFAULT_CACHE_DURATION = 10 # Cache images and data for 10s

#below is not necessary, it is just for the composer
$ sudo vi  /opt/graphite/webapp/content/js/composer_widgets.js
//var interval = 60;
var interval = 10;

I had also to change the interval in collectd (write interval in the rrdtool files on the collectd server tu-mon-01):

$ sudo vi /etc/collectd/collectd.conf
<Plugin rrdtool>
  DataDir "/var/lib/collectd/rrd"
  # CacheTimeout 120
  CacheFlush 2
  WritesPerSecond 60
</Plugin>

Building the first dashboard

To build a dashboard, you need first to use the graphite composer to create graphs such as :

Graphite Composer example

Composer example

Then you can combine graphs in the Dashboard composer such as:

Dashboard Composer example

Dashboard Composer example

You save your dashboard and you share it to get the url.

Graphite Dashboard Example

Graphite Dashboard Example

In the example above, you can see that I have all the indicators I need to oversee the cluster:

  • CPU on the fronts, the mysql server and the load balancer
  • IO on the mysql server
  • Number of Selects, Inserts, Delete and Updates commands
  • Bandwith on each server (to check that I don’t reach any network limit)
  • Bandwith on the mysql server (in this way, I can estimate the HTTP bandwith usage on the fronts because I use the same virtual NIC for the HTTP and the Mysql transport layer)
  • Number of mysql threads
  • Free memory on each server

Writing a custom Graphite collector to get the response time of the Web Application

With graphite it is very easy to write a custom collector in shell script or python. I wrote this custom collector to check the response time of the application

$ cd graphite/graphite/examples
$ cp example-client.py time-url.py
$ vi time-url.py
$ python time-url.py #see the attached file

see attached file: time-url.py

View the performance counters dashboard while running a stress

With Graphite, I was able to see all the indicators I need while I was running a stress such as:

Graphite dashboard as running a stress test

Graphite dashboard as running a stress test

I you can see response time graph shows that the response time increases from 0.25 to 1,75sec during the test.

Zoom on the response time graph

Zoom on the response time graph

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked *