Build a Resilient Application with MongoDB Atlas
On this page
You can configure features of your MongoDB deployments and the driver libraries to create a resilient application that can withstand network outages and failover events. To write application code that takes full advantage of the always-on capabilities of MongoDB Atlas, you should perform the following tasks:
Install the latest drivers.
Use the connection string provided by Atlas.
Use retryable writes and retryable reads.
majoritywrite concern and a read concern that makes sense for your application.
Handle errors in your application.
Install the Latest Drivers
First, install the latest drivers for your language from MongoDB Drivers. Drivers connect and relay queries from your application to your database. Using the latest drivers enables the latest MongoDB features.
Then, in your application, import the dependency:
Atlas provides a pre-configured connection string. For steps to copy the pre-configured string, see Atlas-Provided Connection Strings.
Use a connection string that specifies all the nodes in your Atlas cluster to connect your application to your database. If your cluster performs a replica set election and a new primary is elected, a connection string that specifies all the nodes in your cluster discovers the new primary without application logic.
You can specify all the nodes in your cluster using either:
the DNS Seedlist Connection Format (recommended with Atlas).
The connection string can also specify options, notably retryWrites and writeConcern.
Atlas can generate an optimized SRV connection string for sharded
clusters using the load balancers from your private endpoint
service. When you use an optimized connection string, Atlas limits
the number of connections per
mongos between your application and
your sharded cluster. The limited connections per
improve performance during spikes in connection counts.
Atlas doesn't support optimized connection strings for clusters that run on Google Cloud or Azure.
To learn more about optimized connection strings for sharded clusters behind a private endpoint, see Improve Connection Performance for Sharded Clusters Behind a Private Endpoint.
Atlas-Provided Connection Strings
If you copy your connection string from your Atlas cluster
interface, the connection string is pre-configured for your cluster,
uses the DNS Seedlist format, and includes the recommended
w (write concern) options for resiliency.
To copy your connection string URI from Atlas:
In MongoDB Atlas, under your project, select Database from the navigation panel.
Click Connect on the cluster you wish to connect your application to.
Select Connect Your Application as your connection method.
Select your Driver and Version.
Copy the connection string or full driver example into your application code. You must provide database user credentials.
This guide uses SCRAM authentication through a connection string. If you want to learn about using X.509 certificates to authenticate, see X.509.
Use your connection string to instantiate a MongoDB client in your application:
Retryable Writes and Reads
Starting in MongoDB version 4.0 and with 4.2-compatible drivers, MongoDB retries both writes and reads once by default.
Use retryable writes to retry
certain write operations a single time if they fail. If you
copied your connection string from Atlas, it includes
"retryWrites=true". If you are providing your own connection string,
"retryWrites=true" as a query parameter.
Retrying writes exactly once is the best strategy for handling transient network errors and replica set elections in which the application temporarily cannot find a healthy primary node. If the retry succeeds, the operation as a whole succeeds and no error is returned. If the operation fails, it is likely due to:
A lasting network error
An invalid command
When an operation fails, your application needs to handle the error itself.
Read operations are automatically retried a single time if they fail starting in MongoDB version 4.0 and with 4.2-compatible drivers. No additional configuration is required to retry reads.
Write and Read Concern
You can tune the consistency and availability of your application using write concerns and read concerns. Stricter concerns imply that database operations wait for stronger data consistency guarantees, whereas loosening consistency requirements provides higher availability.
If your application handles monetary balances, consistency is
extremely important. You might use
majority write and read
concerns to ensure you never read from stale data or data that may
be rolled back.
Alternatively, if your application records temperature data from hundreds of sensors every second, you may not be concerned if you read data that does not include the most recent readouts. You can loosen consistency requirements and provide faster access to that data.
You can set the
write concern level
of your Atlas replica set through the connection string URI. Use a
majority write concern to ensure your data is successfully written
to your database and persisted. This is the recommended default and
sufficient for most use cases. If you copied your connection string
from Atlas, it includes
When you use a write concern that requires acknowledgement, such as
majority, you may also specify a maximum time limit for writes
to achieve that level of acknowledgement:
The wtimeoutMS connection string parameter for all writes, or
The wtimeout option for a single write operation.
Whether or not you use a time limit and the value you use depend on your application context.
If you do not specify a time limit for writes and the level of write concern is unachievable, the write operation will hang indefinitely.
You can set the read concern level of your Atlas replica set through the connection string URI. The ideal read concern depends on your application requirements, but the default is sufficient for most use cases. No connection string parameter is required to use default read concerns.
Specifying a read concern can improve guarantees around the data your application receives from Atlas.
The specific combination of write and read concern your application uses has an effect on order-of-operation guarantees. This is called causal consistency. For more information on causal consistency guarantees, see Causal Consistency and Read and Write Concerns.
Invalid commands, network outages, and network errors that are not handled by retryable writes return errors. Refer to your driver's API documentation for error details.
For example, if an application tries to insert a document that contains an
_id value that is already used in the database's collection, your driver
returns an error that includes:
Without proper error handling, an error may block your application from processing requests until it is restarted.
Your application should handle errors without crashing or side
effects. In the previous example of an application inserting a
_id into a collection, that application could handle errors as
The insert operation in this example throws a "duplicate key"
error the second time it's invoked because the
_id field must be
unique. The error is caught, the client is notified, and the app
continues to run. The insert operation fails, however, and it is
up to you to decide whether to show the user a message, retry the
operation, or do something else.
You should always log errors. Common strategies for further processing errors include:
Return the error to the client with an error message. This is a good strategy when you cannot resolve the error and need to inform a user that an action cannot be completed.
Write to a backup database. This is a good strategy when you cannot resolve the error but don't want to risk losing the request data.
Retry the operation beyond the single default retry. This is a good strategy when you can solve the cause of an error programmatically, then retry it.
You must select the best strategies for your application context.
In the example of a duplicate key error, you should log the error but not retry the operation because it will never succeed. Instead, you could write to a fallback database and review the contents of that database at a later time to ensure that no information is lost. The user doesn't need to do anything else and the data is recorded, so you can choose not to send an error message to the client.
Planning for Network Errors
Returning an error can be desirable behavior when an operation would otherwise hang indefinitely and block your application from executing new operations. You can use the maxTimeMS method to place a time limit on individual operations, returning an error for your application to handle if that time limit is exceeded.
The time limit you place on each operation depends on the context of that operation.
If your application reads and displays simple product information
inventory collection, you can be reasonably confident
that those read operations only take a moment. An unusually
long-running query is a good indicator that there is a lasting
network problem. Setting
maxTimeMS on that operation to 5000, or
5 seconds, means that your application receives feedback as soon as
you are confident there is a network problem.
In the spirit of chaos testing, Atlas will perform replica set elections automatically for periodic maintenance and certain configuration changes.
To check if your application is resilient to replica set elections, test the failover process by simulating a failover event.
Resilient Example Application
The example application brings together the following recommendations to ensure resiliency against network outages and failover events:
Use the Atlas-provided connection string with retryable writes, majority write concern, and default read concern.
Specify an operation time limit with the maxTimeMS method. For instructions on how to set
maxTimeMS, refer to your specific Driver Documentation.
Handle errors for duplicate keys and timeouts.
The application is an HTTP API that allows clients to create or list user records. It exposes an endpoint that accepts GET and POST requests http://localhost:3000:
Gets a list of user names from a