we had some issues around the connection to MongoDB, activating a profiler showed us that some of our services, for very few requests (0.1% or less) take about 20 seconds to create a connection (_create_connection function in pymongo specifically)
that led us to investigate and check the configuration, we saw that connect timeout is defaulted to 20 seconds which seems quite a lot
lowering that to 1 second reduced the long requests on our services that uses Mongo to 1-1.5 seconds
I guess, my question is, if we are doing something wrong/right?
since the default is 20 seconds, that has to mean something but it doesn’t make sense to me, why is it so high
can we lower it to 50 milliseconds? what will be the consequences ? high load on our MongoDB?
There’s no special meaning behind the 20 second default. I suspect it was chosen so conservatively to give the driver a better chance of connecting in a wide variety of circumstances. Lowering it is fine but I would suggest no lower around 500ms-1000ms. From my experience, 50ms is too low as it can lead to connection timeouts when the system is under load which can lead to connection storms.
Thanks for your reply
so one path we would take is lowering it to 500ms as you suggested, what I am trying to figure out also, is the root cause for the high connection time
500ms for some of our requests can still be a lot
where do you suggest we should start? connection pool and management? other ideas?
Do the 20 sec timeouts correspond to mongodb maintenance events? Usually a server restart or network down event should be noticed quickly but it’s possible for some types of failures or network partitions to lead to the 20 second timeout. For example, in a black hole scenario the TCP/TLS connection creation will block until the timeout is hit.
this is happening relatively frequently during the day (0.1%~ of the requests from our services)
so my guess it’s not about maintenance events
will consult with our DevOps team about network-related failures