Gracefully reconnecting with the Go driver

Divjot_Arora · March 2, 2020, 3:08am

Hi John,

The driver monitors the deployment using one goroutine per node in the cluster so generally, there should be nothing required on your end to reconnect in the event of a transient issue. For example, when connecting to a replica set, the driver will detect the new primary in the event of a failover or node restart. In addition, the driver will retry certain read and write operations by default in the event of a transient error.

The only thing I can think of checking for that would indicate an issue is a server selection timeout. By default, the driver will try to find a suitable server for an operation for 30 seconds. This is generally enough time for transient network issues to resolve themselves. If your application receives a server selection error, that could be indicative of a more serious issue or a signal that you need to set the server selection timeout to a higher value (e.g. if primary elections are consistently taking 45 seconds, try setting the timeout to 1 minute).

The current driver version (1.3.0 at the time of writing) does not return a custom error type for server selection errors, but you can check for the “server selection error” substring. We have a project planned for this quarter to improve the error types returned by the driver, and I plan on adding a concrete type that users can check for as part of that.

– Divjot