I’ve recently received several email warnings: “You are receiving this alert email because connections to your cluster(s) have exceeded 500”. I’m on the M0 free tier and using Mongoose and Nextjs deployed to Vercel. After a lot of googling I refactored my connection code to cache the connection, see here: birdinghotspots/mongo.ts at main · rawcomposition/birdinghotspots · GitHub
Connection caching seems to work. I console.log every time a new connection is created and I can see this happens once or twice after deployment and then never again.
However, after making that change I received another warning this morning. My connections had spiked to 425 for a 5-10 min period. I set the maxPoolSize option in Mongoose to 10 (see here: birdinghotspots/mongo.ts at main · rawcomposition/birdinghotspots · GitHub ) which seemed to have no effect. My connections are hovering around 25-60 at any given time and seems to increase with higher traffic, such as when several bots are indexing my site at the same time. It even spiked to 97 while I was monitoring it this morning after adjusting the maxPoolSize.
I should add that this is not a crazy high traffic site. We get maybe 10,000 users a month. Though we have tens of thousands of pages, and with all the bots hitting it I notice Vercel logs server requests every couple seconds, sometimes with bursts up to a few a second. I’m not sure what the traffic was like when it hit 380 connections since I wasn’t monitoring it.
Why is maxPoolSize having no effect, and how can I get these connections under control?
This happened again just now. I noticed the spike happened right when a bunch of my Vercel functions timed out at their 10s limit while trying to connect to MongoDB. I had set serverSelectionTimeoutMS to 9s, attempting to prevent reaching the Vercel timeout, but apparently that didn’t fix the problem. What could cause the MongoDB connection to time out? And why would this cause my MongoDB connections to spike exponentially?
I just tried changing bufferCommands to false. I’ll see if that makes any difference.
I’m considering switching to a regular server environment to avoid these serverless headaches.
I’m still encountering this issue about every 2 days, usually early in the morning (Pacific time). It lasts for a minute or two. There’s a spike in connections and most of my queries are exceeding the 10s timeout on Vercel. It does seem like MongoDB successfully connects (see screenshot where I log the successful connection). The queries being run in these functions take < 200ms normally. Note that the Mongo connection is successfully cached when running normally. For some reason during these weird spikes, it tries to create a connection on each function request.
I’m the Product Manager for the Node.js driver here at MongoDB. First off, our apologies as this request seems to have slipped through during the lead up to the Christmas break.
Why is maxPoolSize having no effect, and how can I get these connections under control?
If you’re using Vercel Serverless Functions it’s possible that though you’re specifying a maxPoolSize the connection isn’t being reused as the infrastructure behind the functions spawns additional workers to handle and influx of requests to your site.
Unfortunately I cannot confirm/deny this directly, however given a single MongoClient instance the maxPoolSize options would cap the total number of connections within that client’s connection pool at 2 (per your code sample) as opposed to the default of 100.
This happened again just now. I noticed the spike happened right when a bunch of my Vercel functions timed out at their 10s limit while trying to connect to MongoDB. I had set serverSelectionTimeoutMS to 9s, attempting to prevent reaching the Vercel timeout, but apparently that didn’t fix the problem. What could cause the MongoDB connection to time out? And why would this cause my MongoDB connections to spike exponentially?
I see you have the source for your solution publicly accessible at GitHub - rawcomposition/birdinghotspots, so the first thing we’d want to do is verify the behavior you’ve described.
Can you briefly outline the configuration/deployment requirements to Vercel so that we can spin up a similar deployment? Once we have this application deployed to Vercel and configured alike with your production instance we’ll need to emulate the traffic you’re generating to generate the connections you’ve described.
This exercise will help us better understand how the application running within Vercel is connecting to your cluster and potentially exceeding the expected connection profile.
Feel free to send me a DM if you’d like to discuss further.
I appreciate your help with this! After fiddling and watching closely over the last week, here’s some things I’ve observed:
After adjusting the maxPoolSize to 2, I stopped getting the max connection warning. My surges now peak around 200 connections. However, I’m still having queries timeout.
When I see a cluster of timeout errors, they’re usually a bunch of google bot requests all happening milliseconds apart. See attached screenshot. During that exact second there were 34 requests from google bot requests.
In the attached screenshot, you’ll notice the console.log output relating to the MongoDB connection code. When everything is functioning normally (outside of these error/connection surges), I notice the console.log messages appearing once and all further requests seem to use the existing, cached connection.
I’m wondering if the existing connection disconnects and google attempts to load 20 or 30 pages all at once. Since there’s no existing connection, I’m assuming it initiates a new connection for all 30 new requests. And there wouldn’t be time to cache the 1st request and share it with the remaining 29, because the requests all came in essentially at the same time. That’s just my speculation. Though that may explain the spike in connections, I don’t think it would explain why those 20-30 requests timeout.
There’s a number of .env variables required to deploy to Vercel, I’ll get dev versions of all the necessary keys and DM you the details so you can deploy it to Vercel.
@Adam_Jackson as part of troubleshooting this issue can we just validate that those routes are actually supposed to return a response? Per Vercel’s Serverless Functions Timeouts Conditions Checklist, The function must return an HTTP response, even if that response is an error. If no response is returned, the function will time out.
@alexbevi yes, any of the routes that error, including the ones in the screenshot can be viewed under normal circumstances and they load fine and return a valid HTTP response. E.g. Coshocton County, Ohio, US - Birding Hotspots