Performance Azure > GCP?

Deyan_Petrov · October 23, 2021, 5:36pm

Hi,

I performed a number of load tests with the following setups:

Azure Kubernetes Cluster (AKS) → Mongo DB Atlas on Azure (M20, M30, same region, network peering)
GCP Kubernetes Cluster (GKE) → Mongo DB Atlas on GCP (M20, M30, same region, network peering)

and expected to see much better results from GCP due to much higher IOPS, however to my surprise Azure results were better (???).

The load tests were sending 50 or 100 business operations per second (1 business operation involves 1-2 find and 7-8 insert/replace db operations, involving 3 microservices and 3 corresponding MongoDB dbs, but all on the same MongoDB Atlas server).

Measurements are all done serverside based on traces sent to Azure Application Insights.

I am not sure how I can paste a picture or a xls file here with the results (no attach file possibility)?

I have the feeling I am doing something wrong, so does anyone have ideas what I might have done wrong, or is Azure really more performant than GCP (was always left with the opposite impression, as Azure has some serious disk IOPS issues about which I wrote here: MongoDB Atlas & Azure - a forced marriage? - DEV Community 👩‍💻👨‍💻) …

Best regards,
Deyan

steevej · October 23, 2021, 6:23pm

May be the traces from GCP to Azure Application Insights is more costly than traces from Azure to Azure A.I.?

Deyan_Petrov · October 23, 2021, 7:19pm

Sending the traces from GKE → Azure App Insights is for sure more costly - saw 5-10% more CPU usage of the pods in GKE than in AKS, could be because of this.

However the timestamps in the traces (for the roundtrips to MongoDB) are calculated locally (in the pod running in GKE), and it does not matter afterwards how long it takes to send the trace to Azure App Insights.

E.g. a trace measures roundtrip incl. db processing time for a find operation in MongoDB. It is calculated inside the running pod (e.g. 2 ms), and then this trace with 2ms inside is sent to Azure App Insights (which can take 20-30ms) but the 2ms inside the trace are still 2ms …

Deyan_Petrov · November 30, 2021, 5:34pm

Ran again a bunch of tests without Azure App Insights tracing, and the results are similar - Azure M20 (2 vcpus, 4 gb ram) with 128Gb storage (1100 IOPS) performs better than than GCP M20 (1 vcpu, 4 gb ram) with 128 gb storage (7k+ IOPS) …

Here an extract of the results from today:

provider	test run	ping avg	Request Avg (ms)	Request 99th (ms)	Request Max (ms)	Kubectl CPU% Avg	MongoDB CPU Avg	MongoDB CPU Max	MongoDB Disk Queue Max	MongoDB Disk IOPS Max	MongoDB Disk Latency Max	MongoDB Disk Util % Max	OK Requests	Duration Min
GCP	50PerSec_M20_128_11	9	379	505	623		30	40-80	2	400	8	25	15000	5
GCP	50PerSec_M20_128_12	3	411	573	646		54	134	20	800	40	50	15000	5
GCP	50PerSec_M20_128_13	3	271	372	596		28	56	17	558	43	32	15000	5
GCP	50PerSec_M20_128_14	3	282	373	543		41	90	16	644	52	36	15000	5
Azure	50PerSec_M20_128_10	39	211	278	329	22.5	30	49	2	325	7	40	15000	5
Azure	50PerSec_M20_128_11	10	211	282	315	24	50	50-150	9	450	45	40	15000	5
Azure	50PerSec_M20_128_12	24	209	272	299			30	60	550	40	40	15000	5
Azure	50PerSec_M20_128_13	5	219	287	392		54	134	23	820	42	50	15000	5
Azure	50PerSec_M20_128_14	7	198	271	429		34-38	60-150	21	647	71	48	15000	5
Azure	50PerSec_M20_128_15	5	193	237	357		34	72	26	705	60	47	15000	5

Any ideas? Or this is the reality?

Deyan_Petrov · November 30, 2021, 5:44pm

Even funnier is the fact that M30 with 128Gb storage does not perform better/faster for my workload … strange …

Deyan_Petrov · December 2, 2021, 7:38am

Sorry, I have a mistake above, 128Gb give you 500 IOPS on Azure, not 1100

Deyan_Petrov · January 20, 2022, 9:22am

I performed also tests against AWS, and what seems to be a striking difference is the Max Write Disk Latency … e.g. for M20 with 128Gb storage:

Azure: 40-60ms, 1 spike within 1h up to 80 ms …
GCP: 30-40ms, 2-3 spikes within 1h up to 50 ms …
AWS: 1-2ms stable, only 1-2 spikes within 1h up to 6ms …

Can someone explain the gigantic difference in max write disk latency? How come AWS is 10x better??

Deyan_Petrov · January 20, 2022, 9:45am

Can it be that MongoDB Atlas is using directly attached storage in case of AWS, and network-attached storage in case of Azure and GCP??

Deyan_Petrov · January 23, 2022, 9:27am

Here also the full results (note, absolute numbers are specific to my workload): Real-life performance comparison of MongoDB Atlas on Azure/GCP/AWS - DEV Community 👩‍💻👨‍💻