We are having some issues with our Cloud Run instance.
Stack: Node.js, Express.js, mongodb driver (4.10.0), MongoDB Atlas on version 6.0.3, Cloud Run using VPC Network with Serverless VPC Access Connector and Cloud NAT to get fixed ip
One of the first things the Cloud Run / container process has to do is connect to the database. Afterwards, we create some indices. Only after that, we start the server with
On our development environment: one out of ten times when deploying a new revision, it fails to start.
In most cases, the connection with the database just times out on startup:
MongoServerSelectionError: Server selection timed out after 30000 (with
In some cases, the connection is established, but the creation of the indices right after fails:
MongoNetworkTimeoutError: connection timed out at connectionFailureError
In some other cases, the revision is deployed and starts, but from the logs, it’s clear that the connection with the database is unstable. It has many reconnects, which is not normal.
On our production environment: it’s much worse. It almost never connects to the database succesfully. If it does, it’s not useable due to all the reconnects, that also often fail.
IP whitelisting is not the issue: on development we allow all IPs, and on production we also tested this to see if it made any difference.
We use a Cloud NAT to get a fixed IP for all traffic from our Cloud Run instances.
Our MongoDB Atlas was hosted on AWS. I moved it to GCP to see if this makes a difference. It doesn’t.
Next I wanted to try if VPC peering makes a difference. However, I encountered a problem with setting up the peer network (overlap in CIDR MongoDB and subnet used for NAT). I will test this later on, but don’t know if there is any chance it will fix the problem.
Connecting to our databases (dev & production) from our local machine was never a problem. Same goes for our live app that is currently running on AWS: the connection works perfectly. (We are migrating to GCP Cloud Run, our app is already live on AWS)
I saw online someone say using the flag
useUnifiedTopologyfixed a similar issue for them, but iirc this is no longer available on MongoDB 6.
Similarly, someone else said to use “CPU is always allocated” on Cloud Run settings: this did not help.
Deploying a “hello world” express app that just pings the DB seemed to work at first, but when creating some revisions, it also failed once in a while. So it really seems a problem with the connection itself.
It doesn’t seem as if our codebase can have a large impact, because the first thing we do is connect with the DB, and it’s that that fails (and the hello world app failed too). So it must be either something in GCP or MongoDB I think.
Does anybody have any idea what might cause this? Did anybody encounter a similar problem?