I do some fairly long processing from Node and I regularly get an error message that says:
Name: PooleClearedOnNetworkError
message: Connection to <…> interrupted due to server monitor timeout
I’ve seen these after i upgraded to an M10 instance. I am running Node on a local machine connecting through to Atlas
I’ve seen some suggestions for different error messages for GCP
Any advice?
Hey @michael_hyman1,
The PoolClearedError
can occur due to Intermittent network outages that cause the driver to lose connectivity.
Essentially, this error happens when the driver believes the server associated with a connection pool is no longer available. The pool gets cleared and closed so that operations waiting for a connection can retry on a different server.
To address this:
- Enable retryability in your application if you haven’t already. This allows operations to retry seamlessly on another server.
- Enable SDAM (Server Discovery and Monitoring) monitoring to understand why servers are being marked unknown.
- Similarly, enable connection-level monitoring to see network errors and events.
- Check the cluster’s logs for elections, failovers, and step-downs that could be disrupting connectivity.
- Ensure the cluster itself is healthy and members are communicating properly.
Let me know if you have any other questions!
Regards,
Kushagra
thanks. I have put in the SDAM and connection level monitoring and will see what i find. retry is already on.
i reduced the size of my bulk writes but that doesn’t seem to have had any impact on this. it happens when i’m doing very long runs (> 30 minutes), although i do a lot of reads and writes throughout that period. will see if the new logs show anything
did a bunch of refactoring of the connection pool and also added processing from some of these messages; just did a 90 minute run without trouble so i’m hoping the problem is behind me
1 Like
still occurs. this is happening during reads of very long files. somewhere in I get a serverHeartbeatFailed for two connectionId, then a serverDescriptionChanged, mix of pool changes, then finally the PoolClearedOnNetworkError that halts everything
Does this mean i need to check the connection status before every read?
Do i need to switch to something like mongoose?
This happens when I am around 2M records or so into iterating through a table using for await (const row of cursor). I’m not sure how to recover from it, the behavior is inconsistent. It starts with a serverHeartbeatFailed and then a connectionPoolCleared. Since I’m in the midst of iterating through a collection, how do I recover? Do I need to artificially introduce a monotonically increasing page value to go through in chunks?