I do some fairly long processing from Node and I regularly get an error message that says:
message: Connection to <…> interrupted due to server monitor timeout
I’ve seen these after i upgraded to an M10 instance. I am running Node on a local machine connecting through to Atlas
I’ve seen some suggestions for different error messages for GCP
PoolClearedError can occur due to Intermittent network outages that cause the driver to lose connectivity.
Essentially, this error happens when the driver believes the server associated with a connection pool is no longer available. The pool gets cleared and closed so that operations waiting for a connection can retry on a different server.
To address this:
- Enable retryability in your application if you haven’t already. This allows operations to retry seamlessly on another server.
- Enable SDAM (Server Discovery and Monitoring) monitoring to understand why servers are being marked unknown.
- Similarly, enable connection-level monitoring to see network errors and events.
- Check the cluster’s logs for elections, failovers, and step-downs that could be disrupting connectivity.
- Ensure the cluster itself is healthy and members are communicating properly.
Let me know if you have any other questions!
thanks. I have put in the SDAM and connection level monitoring and will see what i find. retry is already on.
i reduced the size of my bulk writes but that doesn’t seem to have had any impact on this. it happens when i’m doing very long runs (> 30 minutes), although i do a lot of reads and writes throughout that period. will see if the new logs show anything
did a bunch of refactoring of the connection pool and also added processing from some of these messages; just did a 90 minute run without trouble so i’m hoping the problem is behind me
still occurs. this is happening during reads of very long files. somewhere in I get a serverHeartbeatFailed for two connectionId, then a serverDescriptionChanged, mix of pool changes, then finally the PoolClearedOnNetworkError that halts everything
Does this mean i need to check the connection status before every read?
Do i need to switch to something like mongoose?
This happens when I am around 2M records or so into iterating through a table using for await (const row of cursor). I’m not sure how to recover from it, the behavior is inconsistent. It starts with a serverHeartbeatFailed and then a connectionPoolCleared. Since I’m in the midst of iterating through a collection, how do I recover? Do I need to artificially introduce a monotonically increasing page value to go through in chunks?