We’ve recently migrated our architecture to use Sharded MongoDB. We are running a single Amazon machine that hosts six containers: one router, one config server, and four shard containers (including the primary shard).
Only part of our data is actually sharded — specifically, the most frequently accessed collections. As a result, the primary shard holds significantly more data and has more memory allocated to accommodate some large, rarely accessed collections.
However, we’re encountering a recurring issue: approximately every two hours (though not exactly — sometimes with a variance of 10 minutes), we receive the following error:
{'index': 0, 'code': 202, 'errmsg': "Write results unavailable from mongo-shard-0:27017 :: caused by :: Couldn't get a connection within the time limit"}
This error consistently targets mongo-shard-0
, which is our primary shard.
We’re trying to understand what might be causing this. Could it be related to resource constraints, connection pool saturation, or some other issue?