Online Archive: A New Paradigm for Data Tiering on MongoDB Atlas

Benjamin Flast

#MongoDB Atlas

We’re thrilled to announce Online Archive for MongoDB Atlas in beta at MongoDB.live. Online Archive takes a totally different approach to your data by enabling you to tier your data across Atlas clusters and cloud object stores seamlessly. This new powerful capability allows new and previously cost-prohibitive use cases to be brought onto MongoDB Atlas, our first-class managed offering, with little effort.

What is Atlas Online Archive?

With Online Archive, you can define a simple rule for archiving data off of a cluster, pick specific fields you query most frequently, and then sit back. Atlas will automatically move data off of your cluster and into a more cost-effective storage layer that can still be queried with a connection string that combines cluster and archive data, powered by Atlas Data Lake.

Online Archive is a good fit for many different use cases, including:

  • Insert-only workloads, where data is immutable and has lower performance requirements the older it gets
  • Historical log keeping
  • Time-series datasets
  • Storing data that would have been deleted using TTL indexes

Setting up an Online Archive in Atlas

To get started, navigate to the “Online Archive” tab on your Atlas cluster and click “Configure Online Archive”. You can also configure online archives using the Atlas API.

Write an Archiving Rule

The first step is to choose the namespace (database and collection) you would like to start archiving. You will also need to pick a date field from your documents that the archiving can be triggered on and an age limit after which documents will be archived.

Once the selected date field becomes older than the age limit, Atlas will move the document to a MongoDB-managed cloud object store (currently Amazon S3) and delete that data off of your cluster. For example, if we wanted to archive event data after it’s older than a year, we would select the ‘event_date’ field and make the age limit ‘365’.

Configure an Atlas Online Archive and write an archiving rule

Choose Commonly Queried Fields

Next, you will need to choose the fields in the selected namespace that you query on most frequently. This information will be used to write the data to the cloud object store in a format that is optimized for reads that include those query parameters.

This stage will also allow Atlas to create a fully managed Atlas Data Lake that allows you to query a virtual collection that represents data at the specified namespace in both your Atlas cluster and its online archive (more on this later).

In the historical log example from above, we are almost always querying based on date, so we should move the date field to the top. Then let’s assume we also often query for specific individuals, so let’s add ‘user_id’. Nothing else comes to mind so we will leave the second field blank.

Choose commonly queried fields to partition data in Atlas Online Archive

Begin Archiving

Finally, you will need to confirm that archiving can begin, acknowledging that documents cannot be updated once they have been archived. If you run into any problems, you can use mongodump or mongoexport to get data out of the online archive and move it back into your cluster, after which you can delete the archive. Additionally, while data is being archived it is possible it may show up in both your online archive and Atlas cluster.

Confirm Atlas Online Archive details to begin archiving documents

Managing Online Archives

Once your archiving rule has been set up you will be able to manage online archives or create new archiving rules from the “Online Archive” tab. Here you can view and connect to existing archives, update rule age limits, pause archiving rules, or delete archives.

View and manage Atlas Online Archives in the Atlas UI

Using the example from above, maybe we’ve realized we don’t even need a full year of data in our cluster. In this case, we can update the rule to archive data after it’s 6 months old rather than a year and change the age limit to 180.

Querying Across an Atlas Cluster and Online Archive

After setting up your online archive, the last step is to connect to it and query your live data and archived data with a single connection string. In the new connection modal shown below, you will be given an option to “Connect to Cluster and Online Archive”, which gives you a read-only connection string to your live and archival data.

In our example, querying for all events in our events collection will get us Atlas cluster data from the last 180 days and online archive data for everything older than 180 days.

Run federated queries across an Atlas cluster and online archive using a unified endpoint

Try Atlas Online Archive

Online Archive allows you to right-size your Atlas clusters by only storing “hot” data that is regularly accessed and moving “cold” data to a cheaper tier of storage. Billing for this feature will include the cost to store data in our fully managed cloud object storage and usage based pricing around accessing the data in your online archives. For more information, visit our documentation.

We’re thrilled to see what new workloads you’ll be able to bring onto Atlas clusters with the new flexibility provided by Online Archive. To get started, sign up for an Atlas account and deploy any dedicated cluster (M10 or higher).

While in beta we’ll continue to make improvements to performance and flexibility. We look forward to hearing from you as you start to test out the feature. Let us know what use cases you would like to use Online Archive for and which improvements we should make first!

Try MongoDB in the Cloud

Create a free account and launch a cluster in minutes!