Performance Azure > GCP?

Hi,

I performed a number of load tests with the following setups:

  1. Azure Kubernetes Cluster (AKS) → Mongo DB Atlas on Azure (M20, M30, same region, network peering)
  2. GCP Kubernetes Cluster (GKE) → Mongo DB Atlas on GCP (M20, M30, same region, network peering)

and expected to see much better results from GCP due to much higher IOPS, however to my surprise Azure results were better (???).

The load tests were sending 50 or 100 business operations per second (1 business operation involves 1-2 find and 7-8 insert/replace db operations, involving 3 microservices and 3 corresponding MongoDB dbs, but all on the same MongoDB Atlas server).

Measurements are all done serverside based on traces sent to Azure Application Insights.

I am not sure how I can paste a picture or a xls file here with the results (no attach file possibility)?

I have the feeling I am doing something wrong, so does anyone have ideas what I might have done wrong, or is Azure really more performant than GCP (was always left with the opposite impression, as Azure has some serious disk IOPS issues about which I wrote here: MongoDB Atlas & Azure - a forced marriage? - DEV Community) …

Best regards,
Deyan

May be the traces from GCP to Azure Application Insights is more costly than traces from Azure to Azure A.I.?

Sending the traces from GKE → Azure App Insights is for sure more costly - saw 5-10% more CPU usage of the pods in GKE than in AKS, could be because of this.

However the timestamps in the traces (for the roundtrips to MongoDB) are calculated locally (in the pod running in GKE), and it does not matter afterwards how long it takes to send the trace to Azure App Insights.

E.g. a trace measures roundtrip incl. db processing time for a find operation in MongoDB. It is calculated inside the running pod (e.g. 2 ms), and then this trace with 2ms inside is sent to Azure App Insights (which can take 20-30ms) but the 2ms inside the trace are still 2ms …

Ran again a bunch of tests without Azure App Insights tracing, and the results are similar - Azure M20 (2 vcpus, 4 gb ram) with 128Gb storage (1100 IOPS) performs better than than GCP M20 (1 vcpu, 4 gb ram) with 128 gb storage (7k+ IOPS) …

Here an extract of the results from today:

provider test run ping avg Request Avg (ms) Request 99th (ms) Request Max (ms) Kubectl CPU% Avg MongoDB CPU Avg MongoDB CPU Max MongoDB Disk Queue Max MongoDB Disk IOPS Max MongoDB Disk Latency Max MongoDB Disk Util % Max OK Requests NOK Requests Duration Min
GCP 50PerSec_M20_128_11 9 379 505 623 30 40-80 2 400 8 25 15000 0 5
GCP 50PerSec_M20_128_12 3 411 573 646 54 134 20 800 40 50 15000 0 5
GCP 50PerSec_M20_128_13 3 271 372 596 28 56 17 558 43 32 15000 0 5
GCP 50PerSec_M20_128_14 3 282 373 543 41 90 16 644 52 36 15000 0 5
Azure 50PerSec_M20_128_10 39 211 278 329 22.5 30 49 2 325 7 40 15000 0 5
Azure 50PerSec_M20_128_11 10 211 282 315 24 50 50-150 9 450 45 40 15000 0 5
Azure 50PerSec_M20_128_12 24 209 272 299 30 60 550 40 40 15000 0 5
Azure 50PerSec_M20_128_13 5 219 287 392 54 134 23 820 42 50 15000 0 5
Azure 50PerSec_M20_128_14 7 198 271 429 34-38 60-150 21 647 71 48 15000 0 5
Azure 50PerSec_M20_128_15 5 193 237 357 34 72 26 705 60 47 15000 0 5

Any ideas? Or this is the reality?

Even funnier is the fact that M30 with 128Gb storage does not perform better/faster for my workload … strange …

Sorry, I have a mistake above, 128Gb give you 500 IOPS on Azure, not 1100

I performed also tests against AWS, and what seems to be a striking difference is the Max Write Disk Latency … e.g. for M20 with 128Gb storage:

Azure: 40-60ms, 1 spike within 1h up to 80 ms …
GCP: 30-40ms, 2-3 spikes within 1h up to 50 ms …
AWS: 1-2ms stable, only 1-2 spikes within 1h up to 6ms …

Can someone explain the gigantic difference in max write disk latency? How come AWS is 10x better??

Can it be that MongoDB Atlas is using directly attached storage in case of AWS, and network-attached storage in case of Azure and GCP??

Here also the full results (note, absolute numbers are specific to my workload): Real-life performance comparison of MongoDB Atlas on Azure/GCP/AWS - DEV Community