I joined a MERN stack project where Mongo is deployed on AWS by using MongoDB Atlas. Mongo collections have less than 150k records. I tried to test a user flow that generates ~110 mongo queries. I estimate ~110 queries based on all mongo spans tracked by DataDog.
All these mongo spans have duration between 50ms-500ms in production/development environment. I created a test suite in JMeter where I test this flow with 500 virtual users where ramp-up period is 60s. When I run this test, mongo spans have extremely long duration >30s and they cause request timeout errors on the server.
I tried to upgrade the Mongo Atlas environment to M200 (I tried both General option and Local NVMe SSD) and I tried M300 as well. It didn’t help, mongo’s spans duration is too long. When the test was running, I didn’t notice any spikes in Mongo Atlas → Real Time monitor. CPU with Disk Util were under 5%. When I run the test and I see it’s failing, I stop the test and check traces in DataDog, there is not more 1000 mongo spans(queries) in DataDog.
When I open Mongo Atlas Profiling View, I can see that queries execution time is a bit slower when test is running, but most of them are missing. Do you know why profiling view is missing some queries and doesn’t show slow queries >30s I can see in DataDog?
How is it possible that such as a strong environment M200/M300 is not able to process <50k queries with collections <150k records within one minute?
Do you have any idea how I can identify what’s the issue with Mongo server? I attached screenshots from Metrics view where you can see some spikes when tests were running on M200 configuration.
Thanks! I am running the test on Mac M1 machine that is suppose to simulate “browser”. All queries are run by Heroku server that is running on Performance L-dyno with enabled auto scaling. Do you still think this can be the issue?
The only thing I can add is that if you increase the capacity on one side and it is not faster then the bottleneck is on the other side. Starting from this observation, I would try to decrease the capacity on the MongoDB side until performances degrade. That would give you a baseline of what your current test setup is able to load the server.
Just find out the issue is with the Heroku server. Even if you use autoscaling with performance dynos, Heroku server is not able to process a lot of requests. You just need to use more dynos. The Heroku server was blocked on load tests. I just don’t understand why DataDog shows long mongo spans if mongo queries weren’t executed. When I checked mongo logs, the queries are missing.