AWS EBS IO performance evaluation

Hammerora TPM - System Comparison overview

Hammerora TPM – System Comparison overview

AWS is a fantastic offering for those seeking to host their applications.

It is easy and fast :

  • to set up (lot of templates, Web Interface, Elastic Load Balancing, API)
  • to change (changing EC2 instance type is easy and fast)
  • to secure (backup in S3, integrated firewall, Virtual Private Cloud)
  • to monitor (Cloud watch)
  • to manage (Web Interface, API, Auto Scaling, Ready-to-use databases, …)
And you pay as you use.

But what about performance?

Concerning the Compute performance, I wasn’t worried. Afterwards, AWS EC2 is based on  Xen virtualization and the compute performance will be mainly defined by the quantity of host CPU shares and the size of the memory. So it is just defined by the type of instance you choose (micro, large, x large, …).

But in June 2012 (before the launch of new High I/O EC2 Instance Type), I was more concerned by the performance of the IO. EBS (Elastic Bloc Storage) is amazing ; it is so easy to create, to change and to backup. But can we use it for any workload even for heavy database work load?

So I decided to do some tests with SQL Server, SQL IO (Disk Subsystem Benchmark Tool) and Hammerora (an Open Source Database Test Tool).

I used also the tool PerformanceTest from PassMark Software to get some standard benchmark results.
Continue reading


Scaling heavy database usage with mysql Master Slave configuration

Number of Read Replica influence on Heavy database usage overview

Results Overview

Here I’m experimenting a way to scale a web site that generates an heavy workload on the database.
The idea is to separate the reads from the insert/update sql statements in order to dispatch the reads (select) statements on serveral servers.
I’m going to use a mysql Master / Slave configuration (Read replica).

In order to fully understand the way it will work, to eliminate a maximum of PHP execution impact and to facilitate the test, I decided to use micro PHP benchs made up of simple select and insert statements.
Continue reading


HAProxy – Experimental evaluation of the performance

HAProxy max throughput

HAProxy max throughput overview

When I was evaluating the influence of the number of fronts on the performance of WordPress (see the post Scaling-out WordPress – Performance Measures – Number of fronts influence), I was wondering what should be the number of request per seconds HAProxy was able to sustain.

Some benchmarks on the HAProxy web site show that HAProxy is able to manage up to 25K req/sec at 8KB with a 10GB Nic.
I decided to try to determine the limit on my testing virtualized environment.

In this test, I use the standard “It works” apache page (177 bytes)
Continue reading


Scaling-out WordPress – Performance Measures – Using Graphite

graphite overview - short image

Results overview

The dashboard above shows what you can do with graphite

At one glance, you have all the indicators you need in real time (I have an auto refresh every 10sec)

In this post, I will show you how to get it by installing and configuring collectd, graphite.
Then we will write a custom performance collector in python to check the response time.
Continue reading


Scaling-out WordPress – Performance Measures – Number of fronts influence

Number of fronts influence overview

Results overview

As we know about the number of requests we can expect from a single WordPress front server, it’s time to try to scale out.

Load balancer configuration

I will use haproxy, and as the post HAProxy – Experimental evaluation of the performance shows it, I can setup a small VM with 1vcpu and 512MB.
As WordPress is a stateless product, I don’t need to manage session persistence so a very basic configuration can be used (see Software Installation)

In a following post, I will show how to share sessions on an NFS server. For the need of sessions, I will install a e-commerce plugin: WooCommerce

I also deactivated the cache but kept APC.
Continue reading


Scaling-out WordPress – Performance Measures – Cache influence

Cache Influence on req/sec

Results overview

There is a classical method to increase the performance of web site that consists in caching the HTML pages that are rendering to the client.
That avoids the server to execute the code to request the database and to render the page. The pre-formatted HTML page is generally stored in the file system.
I would like to try it with this workload (that is perfect for caching because there is no specific information linked to the visitor).

I could use many methods:

  • caching with apache
  • caching with varnish
  • caching with a specialized PHP code

I decided to use the third method as there are a lot of existing WordPress plugins that do the job.
I installed the wp super cache plugin and did a new series of tests.
Continue reading


Scaling-out WordPress – Performance Measures – CPU Generation influence

Hardware generation CPU result overview

Results overview

As I use different generation Hardware to host the VM, I am wondering what could be the influence of the generation of the CPU. A VM with 2 VCPU on a I7 860 @2800 Mhz (2009 Generation, server ade-esxi-03) should run faster than a VM on a Q6600 @2400 Mhz (2007 Generation, server ade-esxi-02).

Not only the CPU is different between the 2 servers, the RAM generation also differs : DDR2@800 and DDR3@1066. Despite the CPU and the RAM, I assume that the others elements (MainBoard, Disks, …) should have minimal impact on the tests as the MainBoard generally represents a few percentage in the difference of performance and the workload is not tied to the IO.
Continue reading


Scaling-out WordPress – Performance Measures – APC influence of the fronts

APC Influence overview

Results overview

In this post, I’m going to evaluate what increase of performance we could expect from using APC (Alternative PHP Cache).
APC is a opcode cache. That means that with APC, the PHP executable that is embedded in each apache process does not have to compile in opcode each PHP script.
The opcode is compiled during the first execution and is maintained in a shared memory area, so the successive php executions can get directly the opcode from the memory for each new request. In theory, we gain the time to compile from PHP script to opcode for each successive request.
Continue reading


Scaling-out WordPress – Performance Measures – CPU and RAM influence on the fronts

 CPU & RAM influence

Results overview

Raw throughput of a WordPress web site and influence of cpu and memory on the front

In order to evaluate the expected performance of a WordPress web site, I first decided to focus on a single front and to play with the CPU and MEMORY resources of the front.

To be sure that no other bottlenecks appear (such as the database server resources  -CPU, MEMORY, IO – or the network resources), I was continually checking the performance counters on the database server and on the front server with commands such as iostat, vmstattop and iftop for the network during the workload. After a while, I found the perfect command dstat that shows all counters for CPU, IO and MEMORY from just one command line.

To quickly evaluate the speed (throughput) of the web site, I decided to use the apache tool ab that gives a very useful indicator in req/s.
Continue reading