Mongo-golang driver is not reconnecting after mongo atlas server is down due to server size increaese

Jairo_Lozano · June 29, 2021, 2:11am

We have a golang server in production that writes logs to a database in mongo atlas. Our mongo atlas configuration automatically increases the server disk when it’s about to get full. When the upgrade happens the server is down, apparently for a few minutes, but when it comes up again, the driver is not able to reconnect and all the following logs fail to be inserted to the database.

We tried to reproduce this problem locally by stoping and restarting a docker container running a mongo db database, but the driver is perfectly capable of reconnecting in this situation. It seems to be something related to DNS resolution.

We have seen several time-related parameters that we can tune like:

ServerSelectionTimeout
HeartbeatInterval
ConnectionTimeout
MaxConnIdleTime
SocketTimeout

But since we don’t know the exact error (and we can’t reproduce the problem locally) it’s not clear which one to use (we’ve tried them all locally).

We think it could be something like the driver storing the server ip behind the connection string and storing it to skip the DNS in future requests. Then, after Atlas upgrades the server maybe the IP is no longer the same and that’s why the driver is not able to communicate remotely but it is able to do so locally (since the server IP doesn’t change and there’s no DNS translation).

Has anyone gone through something similar? How do you handle reconnecting to a remote mongo db on Atlas?

We found this thread but didn’t find the solution to our problem.

Thanks in advance!

Isabella_Siu · June 29, 2021, 3:21pm

Hi @Jairo_Lozano,

We’re looking at a bug related to this right now, and are planning to get a fix out for the next patch release.

Jairo_Lozano · June 29, 2021, 4:33pm

Thanks @Isabella_Siu! Please let me know when it’s released!

Isabella_Siu · June 29, 2021, 9:14pm

Hi again @Jairo_Lozano ! It’ll be released in v1.5.4, which is scheduled for July 6th.

Jairo_Lozano · June 30, 2021, 12:26am

cool! thanks @Isabella_Siu

Matt_Dale · July 6, 2021, 10:51pm

@Jairo_Lozano we just released Go driver v1.5.4, which includes a fix for SRV polling that should resolve the problem you encountered with having to restart your application after scaling an Atlas cluster.

Check out the v1.5.4 release on GitHub.

Jairo_Lozano · July 7, 2021, 6:02pm

Thanks @Matt_Dale!! I’ll upgrade the driver and let you know if that solves the problem!

Elvin_Gonzalez · February 6, 2023, 10:47pm

Hi, i know the old post, it’s happening to me in php with laravel, i use “jenssegers/mongodb”: “^3.8.4”, maybe you can help me with that problem?

Matt_Dale · February 9, 2023, 1:09am

@Elvin_Gonzalez I work on the Go driver, so I’m not as effective at answering questions about the PHP driver. You’re more likely to get someone with PHP driver expertise if you create a new topic describing your issue in the “Drivers & ODMs” section with tag “php”.

Elvin_Gonzalez · February 9, 2023, 5:49pm

ok, I understand, thank you very much.