Lots of lag in last two weeks on Atlas

Kyle_J_Kemp · March 29, 2020, 7:43pm

Hi! I moved here since mLab was bought out and they wanted to move users to Atlas. Everything was going fine until a week or two ago when something strange happened - I get a lot of lag around some DB-related operations. Previously, my login process was fairly instant, but now I’m seeing a lot of lag around an operation (35.5 seconds to complete, and it gets worse as time goes on):

Note that when this issue originated, I have made no changes to my codebase - previously it was fine. I figured this was related to a slow DB query so I checked the profiler, but:

It seems like there are no slow queries, which I’m not so sure about. I’ve went and added a few additional indexes but it hasn’t really helped anything. I’m pretty much out of ideas - is there anything that could have caused this recently on the Atlas side? FWIW, my local DB/codebase operates fine, it’s just prod that’s breaking.

I’ve tried upgrading my Atlas cluster, that didn’t seem to change anything. I’ve also tried upgrading my VPS and that didn’t change anything. I’m genuinely at a loss of what I can do to troubleshoot this further… any ideas would help, thanks.

Stennie_X · March 29, 2020, 8:10pm

Welcome to the community @Kyle_J_Kemp!

Have you discussed your issue with Atlas support?

It looks like the timing you are measuring is from your application point of view, so likely includes network round trip and other application processing time. To better understand & troubleshoot application performance issues, I suggest separately profiling the time spent processing in database vs application code using an Application Performance Management (APM) tool with Atlas support.

If you have an M10+ Atlas cluster, New Relic and Data Dog are APM solutions currently available as third-party monitoring services that can correlate database activity with application metrics. Both have trial periods so you could test to see if they provide any additional insights.

Regards,
Stennie

Kyle_J_Kemp · March 29, 2020, 8:30pm

Hi, and thanks. I don’t currently have a support plan because my project is just a hobby one.

Yes, I am measuring it in my application, and it turns out my hunch was semi-correct - the fault resided with Azure. I moved my DB to AWS an hour or two ago and that seems to have resolved the issue (I initially re-created it because I found you can’t downgrade back to free after upgrading).

Not really sure what happened in the last two weeks to bog down Azure so much but I guess this issue is resolved.

Stennie_X · March 29, 2020, 8:39pm

Hi Kyle,

There has been a significant increase in cloud services activity given recent world events, and Azure in particular has had some challenges around availability of new instances. I expect those are temporary challenges, but if you are using lower cost tiers for your hobby project there may have been more notable impact.

Recent story:

Regards,
Stennie