Scaling-out WordPress – Performance Measures – CPU and RAM influence on the fronts

 CPU & RAM influence

Results overview

Raw throughput of a WordPress web site and influence of cpu and memory on the front

In order to evaluate the expected performance of a WordPress web site, I first decided to focus on a single front and to play with the CPU and MEMORY resources of the front.

To be sure that no other bottlenecks appear (such as the database server resources  -CPU, MEMORY, IO – or the network resources), I was continually checking the performance counters on the database server and on the front server with commands such as iostat, vmstattop and iftop for the network during the workload. After a while, I found the perfect command dstat that shows all counters for CPU, IO and MEMORY from just one command line.

To quickly evaluate the speed (throughput) of the web site, I decided to use the apache tool ab that gives a very useful indicator in req/s.

Evaluate the influence of the CPU and the memory on the front

For this series of tests, I use a mysql server VM with 2GB and 2vcpu to avoid any contention on the database such as the resources used on the mysql server are very limited for this kind of test (simulating simultaneous access of the home page of the WordPress web site with a very few number of posts).
The important parameters for the sql server are :
innodb_buffer_pool_size=1G
query_cache_limit = 4M
query_cache_size = 512M

The mysql server VM runs on ade-esxi-01: Intel(R) Core(TM) Quad CPU Q6600 @ 2.40GHz; No Hyperthreading; 8 GB DDR2 800; Raid card 3ware 9650SE+BBU; 4x1TB RAID10; WD Caviar Black 7200t/min (WD1001FALS)

I runs the ab command from a separate server (AMD bicore). Each result is the best one from a series of 3 tests.
I used the parameter -k to use HTTP KeepAlive feature.

Influence of the memory increase on the front.

For the three first tests, I kept a constant CPU resource and I changed only the RAM

Test #1, 1 Front, 1Vcpu, 512 MB on server ade-esxi-02 (Intel(R) Core(TM) Quad CPU Q6600 @ 2.40GHz; No Hyperthreading; 8 GB DDR2 800) :

$ ab -kc 10 -n 20 http://tu-web-01/mysite/
Requests per second:    0.72 [#/sec] (mean)

The cpu on the front was at 100% (idle 0%) as the following iostat result shows it


Test #2, 1 Front, 1Vcpu, 1 GB on server ade-esxi-02 (Intel(R) Core(TM) Quad CPU Q6600 @ 2.40GHz; No Hyperthreading; 8 GB DDR2 800) :

$ ab -kc 10 -n 20 http://tu-web-01/mysite/
Requests per second:    1.90 [#/sec] (mean)

The cpu on the front was at 100% (idle 0%) as the following iostat result shows it


No swap was used

$ free -m
             total       used       free     shared    buffers     cached
Mem:           995        412        582          0         22        112
-/+ buffers/cache:        277        717
Swap:          507          0        507

Test #3, 1 Front, 1Vcpu, 2 GB on server ade-esxi-02 (Intel(R) Core(TM) Quad CPU Q6600 @ 2.40GHz; No Hyperthreading; 8 GB DDR2 800) :

$ ab -kc 10 -n 20 http://tu-web-01/mysite/
Requests per second:    2.01 [#/sec] (mean)

The cpu on the front was at 100% (idle 0%) as the following iostat result shows it


No swap was used and we had approx 1,6G free memory

tprojadmin@tu-web-01:~$ free -m
             total       used       free     shared    buffers     cached
Mem:          2003        431       1572          0         22        112
-/+ buffers/cache:        296       1707
Swap:          507          0        507

During the 3 tests, the cpu usage on the mysql server was very low (approx 1%) and no iops occured on the disk as the test only generated read access and the database fitted in memory.
The following iostat results (during the workload) on the mysql server demonstrates it

Influence of the cpu increase on the front.

For the following three tests, I kept a constant RAM resource and I changed only the CPU
Rem:
To increase the CPU, I changed the number of cores per socket in VMware 5 (I kept the Number of virtual sockets to 1)

Test #4, 1 Front, 1Vcpu, 2GB on server ade-esxi-02 (Intel(R) Core(TM) Quad CPU Q6600 @ 2.40GHz; No Hyperthreading; 8 GB DDR2 800) :

$ ab -kc 10 -n 20 http://tu-web-01/mysite/
Requests per second:    2.02 [#/sec] (mean)

dstat shows a 100% CPU usage and 1170M Free memory (no swap)


Test #5, 1 Front, 2Vcpu, 2GB on server ade-esxi-02 (Intel(R) Core(TM) Quad CPU Q6600 @ 2.40GHz; No Hyperthreading; 8 GB DDR2 800) :

$ ab -kc 10 -n 20 http://tu-web-01/mysite/
Requests per second:    3.39 [#/sec] (mean)

dstat shows a 100% CPU usage and 1232M Free memory (no swap)


Test #6, 1 Front, 4Vcpu, 2GB on server ade-esxi-02 (Intel(R) Core(TM) Quad CPU Q6600 @ 2.40GHz; No Hyperthreading; 8 GB DDR2 800) :

$ ab -kc 10 -n 20 http://tu-web-01/mysite/
Requests per second:    6.78 [#/sec] (mean)

dstat shows a 100% CPU usage and 1030M Free memory (no swap)


On the mysql server, dstat shows a very limited cpu usage (approx 1%)

Conclusion on cpu increase and ram increase

For this simple workload (accessing the Home page of a wordpress blog that contains 5 middle sized post), the 2 following graphs show how the performance is related to the CPU and RAM increase.

RAM Influence on req/sec

RAM Influence on req/sec

CPU Influence on req/sec

CPU Influence on req/sec

We clearly see that as the front server has enough memory to not swap, the predominant parameter is the CPU.

So we could expect a big increase of performance by adding more fronts (linear? we will see it soon)

As an apache process (that embeds the php executable) in the prefork model and mod_php consumes approx 30M-50M, and the free RAM is approx 1600M (roughly 400M is consumed by the system and the different system’s processes), we can presume that we could hit the max memory with 30-50 concurrent requests.

Let’s have a try:

ab -kc 45 -n 200 http://tu-web-01/mysite/

gives (dstat result)

usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq:usr sys idl wai hiq siq| recv  send| read  writ| int   csw | used  buff  cach  free| 1m   5m  15m |run blk new| used  free>
 63  27   0   6   0   4: 70  22   0   7   0   1: 61  29   0   8   0   2: 73  19   0   6   0   2: 66  24   0   7   0   3|  64k   51k|3984k   27M|  12k 1976 |1935M  872k 13.0M 54.4M|26.2 22.3 10.9| 24  26 4.0|  69M  439M>
 51  11   0  32   0   6: 42  13   4  41   0   1: 31  20   0  45   0   4: 36  20   0  42   0   2: 40  16   1  40   0   3| 343k  101k|  11M   57M|7581  3041 |1937M  872k 11.5M 53.8M|26.2 22.3 10.9| 12  38 8.0| 123M  385M>
 62   9   0  21   0   8: 49  12   0  35   0   4: 55   7   0  34   0   3: 46   8   0  42   0   3: 53   9   0  33   0   5| 426k  310k|  15M   48M|5869  2947 |1937M  872k 11.8M 53.8M|26.2 22.3 10.9| 47  41   0| 165M  343M>
 57  15   0  22   0   6: 31  11   0  57   0   1: 53   7   0  38   0   2: 22  11   0  65   0   2: 40  11   0  46   0   3| 469k  275k|  13M 7452k|3800  2095 |1901M  872k 14.2M 87.1M|28.0 22.7 11.1|5.0  44   0| 178M  330M>
 63  15   0  19   0   3: 63  11   0  26   0   0: 57  13   0  29   0   1: 62   8   0  28   0   2: 61  12   0  25   0   2| 434k  421k|  19M   20M|4878  2672 |1867M  872k 15.0M  121M|28.0 22.7 11.1| 13  37   0| 191M  317M>

So we are just above the limit of the memory : 53M free RAM and approx 200M of swap consumed

go to Introduction

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked *