We have a golang server in production that writes logs to a database in mongo atlas. Our mongo atlas configuration automatically increases the server disk when it’s about to get full. When the upgrade happens the server is down, apparently for a few minutes, but when it comes up again, the driver is not able to reconnect and all the following logs fail to be inserted to the database.
We tried to reproduce this problem locally by stoping and restarting a docker container running a mongo db database, but the driver is perfectly capable of reconnecting in this situation. It seems to be something related to DNS resolution.
We have seen several time-related parameters that we can tune like:
ServerSelectionTimeout
HeartbeatInterval
ConnectionTimeout
MaxConnIdleTime
SocketTimeout
But since we don’t know the exact error (and we can’t reproduce the problem locally) it’s not clear which one to use (we’ve tried them all locally).
We think it could be something like the driver storing the server ip behind the connection string and storing it to skip the DNS in future requests. Then, after Atlas upgrades the server maybe the IP is no longer the same and that’s why the driver is not able to communicate remotely but it is able to do so locally (since the server IP doesn’t change and there’s no DNS translation).
Has anyone gone through something similar? How do you handle reconnecting to a remote mongo db on Atlas?
We found this thread but didn’t find the solution to our problem.
Thanks in advance!