Charting a Course to MongoDB Atlas: Part 1 - Preparing for the Journey

Michael Lynn

Cloud
Facebook ShareLinkedin ShareReddit ShareTwitter Share

MongoDB Atlas is an automated cloud MongoDB service engineered and run by the same team that builds the database. It incorporates operational best practices we’ve learned from optimizing thousands of deployments across startups and the Fortune 100. You can build on MongoDB Atlas with confidence, knowing you no longer need to worry about database management, setup and configuration, software patching, monitoring, backups, or operating a reliable, distributed database cluster.

MongoDB introduced its Database as a Service offering, in July of 2016 and it’s been a phenomenal success since its launch. Since then, thousands of customers have deployed highly secure, highly scalable and performant MongoDB databases using this service. Among its most compelling features are the ability to deploy Replica Sets in any of the major cloud hosting providers (AWS, Azure, GCP) and the ability to deploy database clusters spanning multiple cloud regions. In this series, I’ll explain the steps you can follow to migrate data from your existing MongoDB database into MongoDB Atlas.

Preparing for the Journey

Before you embark on any journey regardless of the destination, it’s always a good idea to take some time to prepare. As part of this preparation, we’ll review some options for the journey — methods to get your data migrated into MongoDB Atlas — along with some best practices and potential wrong turns to watch out for along the way.

Let’s get a bit more specific about the assumptions I’ve made in this article.

  • You have data that you want to host in MongoDB Atlas.
    • There’s probably no point in continuing from here if you don’t want to end up with your data in MongoDB Atlas.
  • Your data is currently in a MongoDB database.
    • If you have data in some other format, all is not lost — we can help. However, we’re going to address a MongoDB to MongoDB migration in this series. If you have other requirements -- data in another database or another format, for example, let me know you’ll like an article covering migration from some other database to MongoDB and I’ll make that the subject of a future series.
  • Your current MongoDB database is running MongoDB Version 3.0 or greater. MongoDB Atlas supports version 3.4, and 3.6. Therefore, we’ll need to work to get your database upgraded either as part of the migration - or, you can handle that ahead of the migration. We have articles and documentation designed to help you upgrade your MongoDB instances should you need.
  • Your data is in a clustered deployment (Sharded or Replica Set). We’ll cover converting a standalone deployment to a replica set in part 3 of this series.

At a high level, there are 4 basic steps to migrating your data. Let’s take a closer look at the journey:

  1. Deploy a Destination Cluster in MongoDB Atlas
  2. Prepare for the Journey
  3. Migrate the databases
  4. Cutover and Modify Your Applications to use the new MongoDB Atlas-based Deployment

As we approach the journey, it's important to know the various routes from your starting point to your eventual destination. Each route has its considerations and benefits and the choice of which route you choose will ultimately be up to you. Review the following table which presents a list of the available data migration methods from which you may choose.

Method Descriptions Considerations Benefits Version Notes
Live Import Fully automated via the Atlas administrative console. Downtime: Minimal - Cutover Only. Fully automated. From:Version 2.6, 3.0, 3.2, 3.4To: 3.4, 3.6
mongomirror mongomirror is a utility for migrating data from an existing MongoDB replica set to a MongoDB Atlas replica set. mongomirror does not require you to shut down your existing replica set or applications Downtime: Minimal - Cutover Only. Version 2.6 or great 3.4, 3.6
mongorestore mongorestore is a command-line utility program that loads data from either a binary database dump created by mongodump or the standard input. Downtime required Version 2.6 or Greater 3.4, 3.6
mongoimport mongoimport tool imports content from an Extended JSON, CSV, or TSV export created by mongoexport, or potentially, another third-party export tool Downtime required

For a majority of deployments, Live Import is the best, most efficient route to get your data into MongoDB Atlas. It offers the ability to keep your existing cluster up and active (but not too active, see considerations.) There are considerations, however. If you’re not located in a region that is geographically close to the US-EAST AWS datacenter, for example, you may encounter unacceptable latency. There are a number of possible concerns you should consider prior to embarking on your migration journey. The following section offers some helpful route guidance to ensure that you’re going in the right direction and moving steadily toward your destination.

Route Guidance for the Migration Journey

If you've made it this far, you’re likely getting ready to embark on a journey that will bring your data into a robust, secure, and scalable environment within MongoDB Atlas. The potential to encounter challenges along the way is real and the likelihood of encountering difficulties depends primarily upon your starting point in that journey. In this section, I’ll discuss some potential issues you may encounter as you prepare for your migration journey. A summary of the potential detours and guidance for each is presented in the following table.

Follow the links in the table to read more about each potential detour and its relevant guidance:

Potential Detour Guidance Reference
Insufficient RAM on Destination Cluster Calculate the RAM required for your application and increase that to account for the migration process requirements How do I calculate how much RAM I need for my application?
Too Much Network Latency Between Source and Destination Reduce Latency, or leverage mongodump/mongorestore instead of Live Import
Insufficient Network Access due to missing IP Whitelist or Firewall Rules Ensure that MongoDB Live Import Application Servers are whitelisted and that corporate firewalls permit access between source, destination
Insufficient user access permissions to source database deployment Ensure that authentication is enabled and that the user credentials granted for source database have required entitlements
Insufficient Oplog Size on Destination Size the operations log appropriately based on the application workload Sizing the Operations Log

Potential Detour: Insufficient RAM on Destination Cluster

Every deployment of MongoDB requires some form of resource to run efficiently. These resource requirements will include things like RAM, CPU, Disk and Network. To ensure acceptable response times and performance of the database, we typically look to the application’s read/write profile to inform the decisions we make about the amounts and sizes of each of these resources we’ll need for our deployment.

The amount of RAM a deployment will require is largely informed by the applications’ demand for data in the database. To approximate RAM requirements, we typically look at the frequently accessed documents in each collection, adding up the total data size and then we increase that by the total size of required indexes. Referred to as the working set, this is typically the approximate amount of RAM we’ll want our deployment to have. A more complete discussion of sizing will be found in the documentation pages on sizing for MongoDB.

Sizing is a tricky task especially for the cost constrained. We obviously don’t want to waste money by over-provisioning servers larger than those we’ll need to support the profile of our users and applications. However, it is important to consider that during our migration, we’ll not only need to account for the application requirements -- we also need to account for the resources required by the migration process itself. Therefore, you will want to ensure that you surpass the requirements for your production implementation when sizing your destination cluster.

Route Guidance: Increase available RAM During Migration

The size of the destination cluster should provide adequate resource across all environmentals (storage, CPU, and Memory) with room to spare. The migration process will require additional CPU and Memory as the destination database is being built from the source. It’s quite common for incoming clusters to be undersized and as a result the migration process fails. If this happens during a migration, you must empty the destination cluster, and resize the cluster to a larger M-Value to increase the amount of available RAM. A great feature of Atlas is that resizing -- in both directions, is extremely easy to do. Whether you’re adding resource (increasing the amount of RAM, Disk, CPU, shards, etc.) or decreasing the same, the process is very simple. Therefore, increasing the resource available on your target environment is painless and easy -- and once the migration completes, you can simply scale back down to a cluster size with less RAM, and CPU.


Potential Detour: Network Latency

Latency is defined as the amount of time it takes for a packet of data to get from one designated point to another. Because the migration process is all about moving packets of data between servers it is by its very nature latency sensitive.

Migrating data into MongoDB Atlas leveraging the Live Import capability involves connecting your source MongoDB Instance to a set of application servers running in the AWS us-east-1 region. These servers act as the conductors running the actually migration process between your source and destination MongoDB Database Servers. A potential detour can crop up when your source MongoDB database deployment exists in a datacenter located far from the AWS us-east-1 region.

Route Guidance: Reduce latency if possible or use mongomirror instead of Live Import

Should your source MongoDB Database servers exist in regions far from these application servers, you may need to leverage mongomirror, mongodump/mongorestore rather than Live Import.


Potential Detour: Network Access

In order to accomplish a migration using Live Import, Atlas streams data through a set of MongoDB-Controller application servers. Atlas provides the IP Address ranges of the MongoDB Live Import servers during the Live Import process. You must be certain to add these IP Address ranges to the IP Whitelist for your Destination cluster.

The migration processes within Atlas run on a set of application servers --- these are the traffic directors. The following is a list of the IP Addresses on which these application servers depend. It is important to ensure that traffic between these servers, your source cluster and the destination cluster is able to freely flow. These addresses are in C.I.D.R. notation.

  • 4.71.186.128/25
  • 4.35.16.128/25
  • 52.72.201.163/32
  • 34.196.196.255/32

An additional area where a detour may be encountered is in the realm of corporate firewall policy.

To avoid these potential detours, ensure that you have the appropriate connectivity from the networks where your source deployment resides to the networks where MongoDB Atlas exists.

Route Guidance: Whitelist the IP Ranges of the MongoDB Live Import Process

These IP ranges will be provided at the start of the migration process. Ensure that you configure the whitelist to enable appropriate access during the migration.


Potential Detour: Insufficient User Rights on Source Deployment

Every deployment of MongoDB should enforce authentication. This will ensure that only appropriate individuals and applications may access your MongoDB data.

A potential detour may arise when you attempt to Live Migrate a database without creating or providing the appropriately privileged user credentials.

If the source cluster enforces authentication, create a user with the following privileges:

  • Read all databases and collections (i.e. readAnyDatabase on the admin database)
  • Read the oplog.

Route Guidance: Ensure Appropriate User Access Permissions on the Source Deployment

Create a SCRAM user and password on each server in the replica set and ensure that this user belongs to roles that have the following permissions:

Read and write to the config database Read all databases and collections. Read the oplog.

For example:

  • For 3.4+ source clusters, a user with both clusterMonitor and backup roles would have the appropriate privileges.
  • For 3.2 source cluster, a user with clusterMonitor, clusterManager, and backup roles would have appropriate privileges.

Specify the username and password to Atlas when prompted by the Live Migration procedure.

Also, once you’ve migrated your data, if the source cluster enforced authentication you must consider that Atlas does not migrate any user or role data to the destination cluster. Therefore, you must re-create the credentials used by your applications on the destination Atlas cluster. Atlas uses SCRAM for user authentication. See Add MongoDB Users for a tutorial on creating MongoDB users in Atlas.


Potential Detour: Insufficient Oplog Size on Destination

The oplog, or operations log is a capped collection that keeps a rolling record of all operations that modify the data stored in your databases. When you create an Atlas cluster to serve as the destination for your migration, by default Atlas creates the oplog size at 5% of the total amount of disk you allocated for the cluster. If the activity profile of your application requires a larger oplog size, you will need to submit a proactive support ticket to have the oplog size increased on your destination cluster.

Route Guidance: Size the Operations Log (Oplog) Appropriately - Submit a Proactive Support Ticket if Oplog Resize is Needed.

As stated previously, the decisions regarding the resources we apply to a given MongoDB Deployment are informed by the profile of the applications that depend on the database. As such, there are certain application read/write profiles or workloads that require a larger than default operations log. These are listed in detail in the documentation pages on the subject of Replica Set Oplog. Here is a summary of the workloads that typically require a larger than normal Oplog:

Updates to Multiple Documents at Once

The oplog must translate multi-updates into individual operations in order to maintain idempotency. This can use a great deal of oplog space without a corresponding increase in data size or disk use.

Deletions Equal the Same Amount of Data as Inserts

If you delete roughly the same amount of data as you insert, the database will not grow significantly in disk use, but the size of the operation log can be quite large.

Significant Number of In-Place Updates If a significant portion of the workload is updates that do not increase the size of the documents, the database records a large number of operations but does not change the quantity of data on disk.

In Conclusion

Regardless of your starting point, MongoDB provides a robust, secure and scalable destination for your data. MongoDB Atlas Live Import automates and simplifies the process of migrating your data to MongoDB Atlas. The command line version of this utility, called mongomirror, gives users additional control and flexibility around how the data gets migrated. Other options include exporting (mongoexport) and importing (mongoimport) your data manually or even writing your own application to accomplish migration. The decision to use one particular method over another depends upon the size of your database, its geographic location as well as your tolerance for application downtime.

If you choose to leverage MongoDB Atlas Live Import, be aware of the following potential challenges along the journey.

  • Increase available RAM During Migration sufficient for application plus migration requirements.
  • Reduce latency if possible or use mongomirror instead of Live Import.
  • Whitelist the IP Ranges of the MongoDB Live Import Process
  • Ensure Appropriate User Access Permissions on the Source Deployment
  • Size the Operations Log (Oplog) Appropriately - Submit a Proactive Support Ticket if Oplog Resize is Needed.

Now that you’re fully prepared, let’s embark on the journey and I’ll guide you through the process of deploying a cluster in MongoDB Atlas and walk you through migrating your data from an AWS Replica Set.