Scaling-out WordPress – Performance Measures – Number of fronts influence

Number of fronts influence overview

Results overview


As we know about the number of requests we can expect from a single WordPress front server, it’s time to try to scale out.

Load balancer configuration

I will use haproxy, and as the post HAProxy – Experimental evaluation of the performance shows it, I can setup a small VM with 1vcpu and 512MB.
As WordPress is a stateless product, I don’t need to manage session persistence so a very basic configuration can be used (see Software Installation)

Rem:
In a following post, I will show how to share sessions on an NFS server. For the need of sessions, I will install a e-commerce plugin: WooCommerce

I also deactivated the cache but kept APC.

Test #19, APC activated, 2 Fronts, 4Vcpu, 2GB on servers ade-esxi-02 and ade-esxi-01 (Intel(R) Core(TM) Quad CPU Q6600 @ 2.40GHz; No Hyperthreading; 8 GB DDR2 800) :

I commented out server3 and server4 in haproxy configuration as followed

server tu-web-01 192.168.100.210:80 check
server tu-web-02 192.168.100.218:80 check
#server tu-web-03 192.168.100.172:80 check
#server tu-web-04 192.168.100.167:80 check
$ ab -kc 10 -n 100 http://tu-lb-01/mysite/
Requests per second:    18.89 [#/sec] (mean)

CPU usage (sum of user and system usage) on the 3 servers are shown here

server Max CPU
load balancer 4%
front1 86%
front2 80%

Rem:
I guess that I didn’t hit the 100% on the front because I used only 10 concurrent access. Testing 3 fronts with 30 simultaneous connections gives almost 100% on each front

dstat result on load balancer


dstat result on front1

dstat result on front2

Test #20, APC activated, 3 Fronts

  • 2 fronts 4Vcpu, 2GB on servers ade-esxi-02 and ade-esxi-01 (Intel(R) Core(TM) Quad CPU Q6600 @ 2.40GHz; No Hyperthreading; 8 GB DDR2 800)
  • 1 front 2Vcpu, 2GB on server ade-esxi-03 (Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz; Hyperthreading active; 8 GB DDR3 1066)

I commented out server4 in haproxy configuration as followed

server tu-web-01 192.168.100.210:80 check
server tu-web-02 192.168.100.218:80 check
server tu-web-03 192.168.100.172:80 check
#server tu-web-04 192.168.100.167:80 check
$ ab -kc 30 -n 200 http://tu-lb-01/mysite/
Requests per second:    28.51 [#/sec] (mean)

CPU usage (sum of user and system usage) on the 4 servers are shown here

server Max CPU
load balancer 16%
front1 98%
front2 99%
front2 100%

dstat result on load balancer


dstat result on front1

dstat result on front2

dstat result on front3

Test #21, APC activated, 4 Fronts

  • 2 fronts 4Vcpu, 2GB on servers ade-esxi-02 and ade-esxi-01 (Intel(R) Core(TM) Quad CPU Q6600 @ 2.40GHz; No Hyperthreading; 8 GB DDR2 800)
  • 2 fronts 2Vcpu, 2GB on server ade-esxi-03 (Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz; Hyperthreading active; 8 GB DDR3 1066)

haproxy configuration:

server tu-web-01 192.168.100.210:80 check
server tu-web-02 192.168.100.218:80 check
server tu-web-03 192.168.100.172:80 check
server tu-web-04 192.168.100.167:80 check
$ ab -kc 40 -n 200 http://tu-lb-01/mysite/
Requests per second:    36.28 [#/sec] (mean)

Rem:
I used 40 simultaneous connections.

CPU usage (sum of user and system usage) on the 5 servers are shown here

server Max CPU
load balancer 5%
front1 96%
front2 100%
front3 100%
front4 100%

dstat result on load balancer


dstat result on front1

dstat result on front2

dstat result on front3

dstat result on front4

Conclusion on influence of the number of fronts

Is it linear ?
The following graph shows that it is pretty much linear in function of the number of fronts for this particular workload.

Number of fronts influence on req/sec

Number of fronts influence on req/sec

In the graph above, the red line is the theoretical throughput (sum of req/sec of the fronts).
We can notice that the obtained line is roughly linear (the regression line has a slope of 9.33, which is close to the average req/s 10.8) and close to the theoretical line.

I suppose that we don’t have exactly the theoretical throughput for different reasons:

  1. The algorithm of the load-balancer could not dispatch exactly in the same way on the fronts
  2. The front1 runs on an the same host as the mysql server (so they share some resources even if the mysql server load is low for this test)
  3. The virtualized layer could induce some decrease in performance when servers are running on the same host (context switchs, shared memory bus, etc…)
  4. The front3 and front4 are not really equivalent to the front1 and 2

But anyway, the result is not so bad when you consider the conditions of the tests.

Which is cool with this configuration is that the service can be delivered to the clients even if some fronts die. But it is another story because if you want a fail-over configuration, you have also to have a redundant load-balancer and a redundant database system.

There is something more annoying when you stress an application on many servers, you have to play with a lot of windows to run the commands that give you the performance counters and it is not so easy to see in real time all the counters on all the servers at one glance.

I need a graphical dashboard for that. Here comes Graphite. It will be the subject of the next post

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked *