Mongo Atlas performance issues

And_Ga · January 8, 2023, 2:25pm

Hi guys,

I joined a MERN stack project where Mongo is deployed on AWS by using MongoDB Atlas. Mongo collections have less than 150k records. I tried to test a user flow that generates ~110 mongo queries. I estimate ~110 queries based on all mongo spans tracked by DataDog.

All these mongo spans have duration between 50ms-500ms in production/development environment. I created a test suite in JMeter where I test this flow with 500 virtual users where ramp-up period is 60s. When I run this test, mongo spans have extremely long duration >30s and they cause request timeout errors on the server.

I tried to upgrade the Mongo Atlas environment to M200 (I tried both General option and Local NVMe SSD) and I tried M300 as well. It didn’t help, mongo’s spans duration is too long. When the test was running, I didn’t notice any spikes in Mongo Atlas → Real Time monitor. CPU with Disk Util were under 5%. When I run the test and I see it’s failing, I stop the test and check traces in DataDog, there is not more 1000 mongo spans(queries) in DataDog.

When I open Mongo Atlas Profiling View, I can see that queries execution time is a bit slower when test is running, but most of them are missing. Do you know why profiling view is missing some queries and doesn’t show slow queries >30s I can see in DataDog?

How is it possible that such as a strong environment M200/M300 is not able to process <50k queries with collections <150k records within one minute?

Do you have any idea how I can identify what’s the issue with Mongo server? I attached screenshots from Metrics view where you can see some spikes when tests were running on M200 configuration.

There are 3 recommendations in Performance Advisor to add an index to 3 collections. Do you think this can be the issue why Mongo server is so slow?

steevej · January 8, 2023, 5:23pm

If

then the following conclusion is wrong

A slow response time on a client does not imply slow queries as indicated by

The corollary is that the bottleneck is elsewhere.

Do you download the documents from the query? You client simulating 500 virtual users might not have the bandwidth to download that much data.

What is the load on the machine running the 500 virtual users? Your client might not be powerful enough to make the query fast enough.

If you increase the capacities on one side and it is not faster then you increased the capacities on the wrong side.

My conclusion is that you are too fast to conclude that

Adding index will definitively help, usually, but if CPU and Disk is under 5% I do not think it will.

And_Ga · January 8, 2023, 6:04pm

Thanks! I am running the test on Mac M1 machine that is suppose to simulate “browser”. All queries are run by Heroku server that is running on Performance L-dyno with enabled auto scaling. Do you still think this can be the issue?

steevej · January 9, 2023, 2:07pm

I know nothing about

so I cannot comment.

The only thing I can add is that if you increase the capacity on one side and it is not faster then the bottleneck is on the other side. Starting from this observation, I would try to decrease the capacity on the MongoDB side until performances degrade. That would give you a baseline of what your current test setup is able to load the server.

And_Ga · January 12, 2023, 5:42pm

Just find out the issue is with the Heroku server. Even if you use autoscaling with performance dynos, Heroku server is not able to process a lot of requests. You just need to use more dynos. The Heroku server was blocked on load tests. I just don’t understand why DataDog shows long mongo spans if mongo queries weren’t executed. When I checked mongo logs, the queries are missing.