I have several micro services connecting with a Mongo Atlas database, using different dbNames and collections. I’ve changed the service from the free tier Mongo Atlas version to the paid Serverless version. At the exact moment I moved to the Serverless version, occasionally (I would say twice per hour), all the services get blocked from the database and can not realize any operation over the MongoDB, receiving this specific error:
Failed with error: connection(“SERVER_URI”) unable to write wire message to network: write tcp IP:PORT->IP:PORT: write: broken pipe
There are some operations that I need to repeat instantly in order to not to lose information, and so I have two questions:
¿Does anyone knows why this error occurs and how could avoid it? (I could not found any info about it)
¿Is there a way of re injecting a db command in order to re run that specific command and not lose the data?
To understand your use case better, I would like to know a few details such as:
Which Official MongoDB Driver are you using?
Have you enabled retryable read/write?
Any specific operations that you see are being done when you get this error?
You can take a look at below thread as a similar error has been discussed there
Lastly, I would recommend you go through below documentation to understand more about retryable read/write operations and how MongoDB can you help you build resilient applications that can withstand network outages and failover events.
@Tarun_Gaur, we have the same problem. We are using MongoDB Atlas serverless and a Go service running on GCP Cloud Run using the latest Mongo driver (v1.11.2).
“Error while executing aggregation pipeline: connection(dev-xxx-lb.xxx.mongodb.net:27017[-3]) unable to write wire message to network: write tcp xxxxx:51945->xxxx:27017: write: broken pipe”
Furthermore, we haven’t enabled retries explicitly, but the latest Golang driver has this enabled by default. I think this behavior can happen with any type of query, but most of the time we see it happen with aggregation pipeline queries, which make up the bulk of the queries we do.
The thread you referenced does seem to discuss a similar problem, but without any solution, though.
I would advise you to bring this up with the Atlas chat support team. They may be able to check if anything on the Atlas side could have possibly caused this broken pipe message. In saying so, if a chat support is raised, please provide them with the following:
Cluster link / name which experienced the issue
Time & date including timezone for when it occurred