I attended an AWS user group a few weeks ago, and many of the questions from the audience concerned caching and performance. In this post, I review the performance implications of using Lambda functions with any database-as-a-service (DBaaS) platform (such as MongoDB Atlas). Based on internal investigations, I offer a specific workaround available for Node.js Lambda functions. Note that other supported languages (such as Python) may only require implementing some parts of the workaround, as the underlying AWS containers may differ in their resource disposal requirements. I will specifically call out below which parts are required for any language and which ones are Node.js-specific.
AWS Lambda is serverless, which means that it is essentially stateless. Well, almost. As stated in its developer documentation, AWS Lambda relies on a container technology to execute its functions. This has several implications:
The first time your application invokes a Lambda function it will incur a penalty hit in latency – time that is necessary to bootstrap a new container that will run your Lambda code. The definition of "first time" is fuzzy, but word on the street is that you should expect a new container (i.e. a “first time” event) each time your Lambda function hasn’t been invoked for more than 5 minutes.
If your application makes subsequent calls to your Lambda function within 5 minutes, you can expect that the same container will be reused, thus saving some precious initialization time. Note that AWS makes no guarantee it will reuse the container (i.e. you might just get a new one), but experience shows that in many cases, it does manage to reuse existing containers.
As mentioned in the How It Works page, any Node.js variable that is declared outside the handler method remains initialized across calls, as long as the same container is reused.
Understanding Container Reuse in AWS Lambda, written in 2014, dives a bit deeper into the whole lifecycle of a Lambda function and is an interesting read, though may not reflect more recent architectural changes to the service. Note that AWS makes no guarantee that containers are maintained alive (though in a "frozen" mode) for 5 minutes, so don’t rely on that specific duration in your code.
In our very first attempt to build Lambda functions that would run queries against MongoDB Atlas, our database as a service offering, we noticed the performance impact of repeatedly calling the same Lambda function without trying to reuse the MongoDB database connection. The wait time for the Lambda function to complete was around 4-5 seconds, even with the simplest query, which is unacceptable for any real-world operational application.
In our subsequent attempts to declare the database connection outside the handler code, we ran into another issue: we had to call db.close() to effectively release the database handle, lest the Lambda function time out without returning to the caller. The AWS Lambda documentation doesn’t explicitly mention this caveat which seems to be language dependent, since we couldn’t reproduce it with a Lambda function written in Python.
Fortunately, we found out that Lambda’s context object exposes a
callbackWaitsForEmptyEventLoop property, that effectively allows a Lambda function to return its result to the caller without requiring that the MongoDB database connection be closed (you can find more information about
callbackWaitsForEmptyEventLoop in the Lambda developer documentation). This allows the Lambda function to reuse a MongoDB Atlas connection across calls, and reduce the execution time to a few milliseconds (instead of a few seconds).
In summary, here are the specific steps you should take to optimize the performance of your Lambda function:
- Declare the MongoDB database connection object outside the handler method, as shown below in Node.js syntax (this step is required for any language, not just Node.js):
- In the handler method, set
false before attempting to use the MongoDB database connection object (this step is only required for Node.js Lambda functions):
- Try to re-use the database connection object using the
MongoDB.connect(Uri) method only if it is not null and
db.serverConfig.isConnected() returns true (this step is required for any language, not just Node.js):
The Serverless development with Node.js, AWS Lambda and MongoDB Atlas tutorial post makes use of all these best practices so I recommend that you take the time to read it. The more experienced developers can also find optimized Lambda Node.js functions (with relevant comments) in:
I’d love to hear from you, so if you have any question or feedback, don’t hesitate to leave them below.
Additionally, if you’d like to learn more about building serverless applications with MongoDB Atlas, I highly recommend our webinar below where we have an interactive tutorial on serverless architectures with AWS Lambda.
Watch Serverless Architectures with AWS Lambda and MongoDB Atlas