GIANT Stories at MongoDB

New to MongoDB Atlas — Get Started with Free Fully Automated Databases on Microsoft Azure

Leo Zheng

Release Notes, Cloud

We’re excited to announce that teams can now use MongoDB Atlas — the global cloud database for MongoDB — for free on Microsoft Azure. The newly available free tier on Azure Cloud, known as the M0, grants users 512 MB of storage and is ideal for learning MongoDB, prototyping, and early development.

The Atlas free tier will run MongoDB 4.0 and grant users access to some of the latest database features, including multi-document transactions, which make it even easier to address a complete range of use cases with MongoDB; type conversions, which allow teams to perform sophisticated transformations natively in the database without costly and fragile ETL; and updated security defaults (SHA-256 and TLS 1.1+).

Like larger MongoDB Atlas cluster types, M0 clusters grant users optimal security with end to end encryption, high availability, and fully managed upgrades. M0 clusters also enable faster development by allowing teams to perform CRUD operations against their data right from their browsers via the built-in Data Explorer.

Finally, free tier clusters on Azure can be paired with MongoDB Stitch — a powerful suite of serverless platform services for apps using MongoDB — to simplify the handling of backend logic, database triggers, and integrations with the wider Azure ecosystem.

At launch, the MongoDB Atlas free tier will be available in 3 Azure regions:

  • East US (Virginia)
  • East Asia (Hong Kong)
  • West Europe (Netherlands)

Creating a free tier is easy. When building a new Atlas cluster, select Azure as your cloud of choice and one of the regions above.

Next, select M0 in the “Cluster Tier” dropdown.

Then, give the cluster a name and hit the “Create Cluster” button. Your free MongoDB Atlas cluster will be deployed in minutes.

New to MongoDB Atlas? Deploy a free cluster in minutes.

New to MongoDB Atlas — Global Clusters Enable Low-Latency Reads and Writes from Anywhere

The ability to replicate data across any number of cloud regions was introduced to MongoDB Atlas, the fully managed service for the database, last fall. This granted Atlas customers two key benefits. For those with geographically distributed applications, this functionality allowed them to leverage local replicas of their data to reduce read latency and provide a fast, responsive customer experience on a global scale. It also meant that an Atlas cluster could be easily configured to failover to another region during cloud infrastructure outages, providing customers with the ability to provision multi-region fault tolerance in just a few clicks.

But what about improving write latency and addressing increasingly demanding regulations, many of which have data residency requirements? In the past, users could address these challenges in a couple of ways. If they wanted to continue using a fully managed MongoDB service, they could deploy separate databases in each region. Unfortunately, this often resulted in added operational and application complexity. They could also build and manage a geographically distributed database deployment themselves and satisfy these requirements using MongoDB’s zone sharding capabilities.

Today we’re excited to introduce Global Clusters to MongoDB Atlas. This new feature makes it possible for anyone to effortlessly deploy and manage a single database that addresses all the aforementioned requirements. Global Clusters allow organizations with distributed applications to geographically partition a fully managed deployment in a few clicks, and control the distribution and placement of their data with sophisticated policies that can be easily generated and changed.

Improving app performance by reducing read and write latency

With Global Clusters, geographically distributed applications can write to (and of course, read from) local partitions of an Atlas deployment called zones. This new Global Writes capability allows you to associate and direct data to a specific zone, keeping it in close proximity to nearby application instances and end users. In its simplest configuration, an Atlas zone contains a 3-node replica set distributed across the availability zones of its preferred cloud region. This configuration can be adjusted depending on your requirements. For example, you can turn the 3-node replica set into multiple shards to address increases in local write throughput. You can also distribute the secondaries within a zone into other cloud regions to enable fast, responsive read access to that data from anywhere.

The illustration above represents a simple Global Cluster in Atlas with two zones. For simplicity’s sake, we’ve labeled them blue and red. The blue zone uses a cloud region in Virginia as the preferred region, while the red zone uses one in London. Local application instances will write to and read from the MongoDB primaries located in the respective cloud regions, ensuring low latency read and write access. Each zone also features a read-only replica of its data located in the cloud region of the other one. This ensures that users in North America will have fast, responsive read access to data generated in Europe, and vice versa.

Satisfying data residency for regulatory requirements

By allowing developers to easily direct the movement of data at the document level, Global Clusters provide a foundational building block that helps organizations achieve compliance with regulations containing data residency requirements. Data is associated with a zone and pinned to that zone unless otherwise configured.

The illustration below represents an Atlas Global Cluster with 3 zones — blue, red, and orange. The configuration of the blue and red zones are very similar to what we already covered. Local application instances read and write to nearby primaries located in the preferred regions — Virginia and London — and each zone includes a read-only replica in the preferred cloud region of every other zone for serving fast, global reads. What’s different is the orange zone, which serves Germany. Unlike data generated in North America and the UK, data generated in and around Germany is not replicated globally; instead, it remains pinned to the preferred cloud region located in Frankfurt.

Deploying your first Global Cluster

Now let’s walk through how easy it is to set up a Global Cluster with MongoDB Atlas.

In the Atlas UI, when you go to create a cluster, you’ll notice a new accordion labelled Global Cluster Configuration. If you click into this and enable “Global Writes”, you’ll find two easy-to-use and customizable templates. Global Performance provides reasonable read and write latency to the majority of the global population and Excellent Global Performance provides excellent read and write latency to majority of the global population. Both options are available across AWS, Google Cloud Platform, and Microsoft Azure.

You can also configure your own zones. Let’s walk through the setup of a Global Cluster using the Global Performance template on AWS. After selecting the Global Performance template, you’ll see that the Americas are mapped to the North Virginia region, EMEA is mapped to Frankfurt, and APAC is mapped to Singapore.

As your business requirements change over time, you are able to switch to the Excellent Global Performance template or fully customize your existing template.

Customizing your Global Cluster

Say you wanted to move your EMEA zone from Frankfurt to London. You can do so in just a few clicks. If you scroll down in the Create Cluster Dialog, you’ll see the Zone configuration component (pictured below). Select the zone you want to edit and simply update the preferred cloud region.

Once you’re happy with the configuration, you can verify your changes in the latency map and then proceed to deploy the cluster.

After your Global Cluster has been deployed, you’ll find that it looks just like any other Atlas cluster. If you click into the connect experience to find your connection string, you’ll find a simple and concise connection string that you can use in all of your geographically distributed application instances.

Configuring data for a Global Cluster

Now that your Global Cluster is deployed, let's have a look at the Atlas Data Explorer, where you can create a new database and collection. Atlas will walk you through this process, including the creation of an appropriate compound shard key — the mechanism used to determine how documents are mapped to different zones.

This shard key must contain the location field. The second field should be a well-distributed identifier, such as userId. Full details on key selection can be found in the MongoDB Atlas docs.

To help show what documents might look like in your database, we’ve added a few sample documents to a collection in the Data Explorer. As you can see above, we’ve included a field called location containing a ISO-3166-1 alpha 2 country code ("US", "DE", "IN") or a supported ISO-3166-2 subdivision code ("US-DC", "DE-BE", "IN-DL"), as well as a field called userId, which acts as our well-distributed identifier. This ensures that location affinity is baked into each document.

In the background, MongoDB Atlas will have automatically placed each of these documents in their respective zones. The document corresponding to Anna Bell will live in North Virginia and the document corresponding to John Doe will live in Singapore. Assuming we have application instances deployed in Singapore and North Virginia, both will use the same MongoDB connection string to connect to the cluster. When Anna Bell connects to our application from the US, she will automatically be working with data kept in close proximity to her. Similarly, when John Doe connects from Australia, he will be writing to the Singapore region.

Adding a zone to your Global Cluster

Now let’s say that you start to see massive adoption of your application in India and you want to improve the performance for local users. At anytime, you can return to your cluster configuration, click “Add a Zone”, and select Mumbai as the preferred cloud region for the new zone.

The global latency map will update, showing us the new zone and an updated view of the countries that map to it. When we deploy the changes, the documents that are tagged with relevant ISO country codes will gracefully be transferred across to the new zone, without downtime.

Scaling write throughput in a single zone

As we mentioned earlier in this post, it’s possible to scale out a single zone to address increases in local write throughput. Simply scroll to the “Zone Configuration”, click on “Additional Options” and increase the number of shards. By adding a second shard to a zone, you are able to double your write throughput.

Low-latency reads of data originating from other zones

We also referenced the ability to distribute read-only replicas of data from a zone into the preferred cloud regions of other zones, providing users with low-latency read access to data originating from other regions. This is easy to configure in MongoDB Atlas. In “Zone Configuration”, select “Add secondary and read-only regions”. Under “Deploy read-only replicas”, select “Add a node” and choose the region where you’d like your read-only replica to live.

For global clusters, Atlas provides a shortcut to creating read-only replicas of each zone in every other zone. Under “Zone configuration summary”, simply select the “Configure local reads in every zone” button.

MongoDB Atlas Global Clusters are very powerful, making it possible for practically any developer or organization to easily deploy, manage, and scale a distributed database layer optimized for low-latency reads and writes anywhere in the world. We're very excited to see what you build with this new functionality.

Global clusters are available today on Amazon Web Services, Google Cloud Platform, and Microsoft Azure for clusters M30 and larger.

Introducing Free Cloud Monitoring for MongoDB

With the release of MongoDB 4.0, we’re excited to announce the availability of free cloud monitoring, the easiest way to monitor and visualize the status of your MongoDB deployments.

There are no agents to install, no forms to fill out, no sign-ups necessary.

Let’s walk through how it works.

After you’ve installed MongoDB 4.0, connect to your instance(s) using the mongo shell, the interactive JavaScript interface to MongoDB. You should see the following message.

MongoDB shell version v4.0.0
connecting to: mongodb://
MongoDB server version: 4.0.0
Enable MongoDB's free cloud-based monitoring service to collect and display
metrics about your deployment (disk utilization, CPU, operation statistics,

The monitoring data will be available on a MongoDB website with a unique
URL created for you. Anyone you share the URL with will also be able to
view this page. MongoDB may use this information to make product
improvements and to suggest MongoDB products and deployment options to you.

To enable free monitoring, run the following command:

When you run the command, you should see something similar to what’s shown below.

    "state" : "enabled",
    "message" : "To see your monitoring data, navigate to the unique URL below. Anyone you share the URL with will also be able to view this page. You can disable monitoring at any time by running db.disableFreeMonitoring().",
    "url" : "",
    "userReminder" : "",
    "ok" : 1

Simply copy and paste your unique URL into a browser to access your monitoring dashboard. Free cloud monitoring tracks key performance indicators such as operation execution times, disk utilization, memory, network input/out, and more in interactive charts.

Mousing over chart lines reveal precise metrics.

You can also zoom in to 1 minute granularity.

Free cloud monitoring supports standalone instances and replica sets of MongoDB 4.0+. Of course, only monitoring metadata is accessed, never the contents of your databases. You can disable monitoring and your unique URL at any time by running the db.disableFreeMonitoring() command.

For more information, visit our documentation.

Get started with free cloud monitoring. Download MongoDB 4.0.

New to MongoDB Atlas — Free Fully Managed Databases on Google Cloud Platform

Today we’re excited to announce that the MongoDB Atlas free tier — which provides access to a fully managed M0 cluster with 512 MB of storage at no cost — is now available on Google Cloud Platform (GCP).

We launched the MongoDB Atlas database as a service on GCP one year ago at MongoDB World 2017. Since then, we’ve made significant product enhancements, culminating in the most powerful MongoDB service for developers building their applications on Google’s expanding ecosystem of cloud services. For example, we launched Atlas into 13 Google cloud regions earlier this year, and added the ability for customers to replicate their data to any number of regions for fast, responsive read access and multi-region fault tolerance.

Companies like Longbow Advantage, a supply chain partner to Del Monte Foods and Subaru of America, are using MongoDB Atlas on GCP to accelerate innovation and stay agile and efficient in their development life cycles.

We wanted to ensure that our team could remain focused on the application and not have to worry about the underlying infrastructure. Atlas allowed us to do just that.

Alex Wakefield, Chief Commercial Officer, Longbow Advantage

The availability of the Atlas free tier on GCP will make it easier than ever for developers using the cloud platform to experiment in an optimized environment for MongoDB, with no barrier to entry. The M0 cluster is ideal for learning MongoDB, prototyping, or early development, and has built-in security, availability, and fully managed upgrades.

The Atlas free tier on GCP is available in 3 regions:

  • Iowa (us-central1)
  • Belgium (europe-west1)
  • Singapore (asia-southeast1)

MongoDB Atlas is available in 13 GCP regions

Getting started is simple. When building a new cluster in MongoDB Atlas, select GCP as the cloud provider and then select the region closest to your application server(s) with the "Free tier available" label.

Then, in the Cluster Tier tab, select the M0 cluster size.

Finally, name your cluster.

That's it. We're excited to see what you build with MongoDB Atlas and GCP!

New to MongoDB Atlas — Full CRUD Support in Data Explorer

As a fully managed database service, MongoDB Atlas makes life simpler for anyone interacting with MongoDB, whether you’re deploying a cluster on demand, restoring a snapshot, evaluating real-time performance metrics, or inspecting data.

Today, we’re taking it one step further by allowing developers to manipulate their data right from within the Atlas UI. The embedded Data Explorer, which has historically allowed you to run queries, view metadata regarding your deployments, and retrieve information such as index usage statistics, now supports full CRUD functionality.

To support these capabilities, new Project-level roles with different permission levels have been added.

You can assign users these new roles in the Users and Teams settings.

In addition, all Data Explorer operations are tracked and presented in the Atlas Activity Feed (found in the Alerts menu for each Project), allowing you to see who did what, and when.

When you click into the Data Explorer in Atlas, you should see new controls for interacting with your documents, collections, databases, and indexes. For example, modify existing documents using the intuitive visual editor, or insert new documents and clone or delete existing ones in just a few clicks. A comprehensive list of available Data Explorer operations can be found in the Atlas documentation.

The Data Explorer is currently available for M10 Atlas clusters and higher.

New to MongoDB Atlas on AWS — AWS Cloud Provider Snapshots, Free Tier Now Available in Singapore & Mumbai

AWS Cloud Provider Snapshots

MongoDB Atlas is an automated cloud database service designed for agile teams who’d rather spend their time building apps than managing databases, backups, and restores. Today, we’re happy to announce that Cloud Provider Snapshots are now available for MongoDB Atlas replica sets on AWS. As the name suggests, Cloud Provider Snapshots provide fully managed backup storage and recovery using the native snapshot capabilities of the underlying cloud service provider.

Choosing a backup method for a database cluster in MongoDB Atlas

When this feature is enabled, MongoDB Atlas will perform snapshots against the primary in the replica set; snapshots are stored in the same cloud region as the primary, granting you control over where all your data lives. Please visit our documentation for more information on snapshot behavior.

Cloud Provider Snapshots on AWS have built-in incremental backup functionality, meaning that a new snapshot only saves the data that has changed since the previous one. This minimizes the time it takes to create a snapshot and lowers costs by reducing the amount of duplicate data. For example, a cluster with 10 GB of data on disk and 3 snapshots may require less than 30 GB of total snapshot storage, depending on how much of the data changed between snapshots.

Cloud Provider Snapshots are available for M10 clusters or higher in all of the 15 AWS regions where you can deploy MongoDB Atlas clusters.

Consider creating a separate Atlas project for database clusters where a different backup method is required. MongoDB Atlas only allows one backup method per project. Once you select a backup method — whether it’s Continuous Backup or Cloud Provider Snapshots — for a cluster in a project, Atlas locks the backup service to the chosen method for all subsequent clusters in that project. To change the backup method for the project, you must disable backups for all clusters in the project, then re-enable backups using your preferred backup methodology. Atlas deletes any stored snapshots when you disable backup for a cluster.

Free, $9, and $25 MongoDB Atlas clusters now available in Singapore & Mumbai

We’re committed to lowering the barrier to entry to MongoDB Atlas and allowing developers to build without worrying about database deployment or management. Last week, we released a 14% price reduction on all MongoDB Atlas clusters deployed in AWS Mumbai. And today, we’re excited to announce the availability of free and affordable database cluster sizes in South and Southeast Asia on AWS .

Free M0 Atlas clusters, which provide 512 MB of storage for experimentation and early development, can now be deployed in AWS Singapore and AWS Mumbai. If more space is required, M2 and M5 Atlas clusters, which provide 2 GB and 5 GB of storage, respectively, are now also available in these regions for just $9 and $25 per month.

DarwinBox Evolves HR SaaS Platform and Prepares for 10x Growth with MongoDB Atlas

DarwinBox found a receptive market for its HR SaaS platform for medium to large businesses, but rapid success strained their infrastructure and challenged their resources. We talked to Chaitanya Peddi, Co-founder and Head of Product to find out how they addressed those challenges with MongoDB Atlas.

Evolution favors those that find ways to thrive in changing environments. DarwinBox has done just that, providing a full spectrum of HR services online and going from a standing start to a top-four sector brand in the Indian market in just two years. From 40 enterprise clients in its first year to more than 80 in its second, it now supports over 200,000 employees, and is hungrily eyeing expansion in new territories.

“We’re expecting 10x growth in the next two years,” says Peddi. “That means aggressive scaling for our platform and MongoDB Atlas will play a big role."

Starting from a blank sheet of paper

The company’s key business insight is that employees have grown accustomed to the user experience of online services they access in their personal lives. However, the same ease of use is simply not found at work, especially in HR solutions that address holiday booking, managing benefits, and appraisals. DarwinBox’s approach is to deliver a unified platform of user-friendly HR services to replace a jumble of disparate offerings, and to do so in a way that supports its own aggressive growth plans. The company aims to support nearly every employee interaction with corporate HR, such as recruitment, employee engagement, expense management, separation, and more.

“We started in 2015 from a blank sheet of paper,” Peddi says. “It became very clear very quickly that for most of our use cases, only a non-relational database would work. Not only did we want to provide an exceptionally broad set of integrated services, but we also had clients with a large number of customization requirements. This meant we needed a very flexible data model. We looked at a lot of options. We wanted an open source technology to avoid lock-in and our developers pushed for MongoDB, which fit all our requirements and was a pleasure to work with. Our databases are now 90 percent MongoDB. We expect that to be at 100 percent soon.”

Reducing costs and future-proofing database management

When DarwinBox launched, it ran its databases in-house, which wasn’t ideal. “We have a team of 40+ developers, QA and testers, and three running infrastructure, and suddenly we’re growing much faster than we expected. It’s a good problem to have, but we couldn’t afford to offer anything less than excellent service.” Peddi emphaszied that of all the things they wanted to do to succeed, becoming database management experts wasn’t high on the list.

This wasn’t the only reason that MongoDB Atlas looked like the next logical step for the company when it became available, says Peddi, “We were rapidly developing our services and our customer base, but our strategies for backing up the databases, for scaling, for high availability, and for monitoring performance weren’t keeping up. In the end, we decided that we’d migrate to Atlas for a few major reasons.”

The first reason was the most obvious. “The costs of managing the databases, infrastructure, and backups were increasing. In addition, it became increasingly difficult to self-manage everything as requirements became more sophisticated and change requests became more frequent. Scaling up and down to match demand and launching new clusters consumed precious man hours. Monitoring performance and issue resolution was taking up more time than we wanted. We had built custom scripts, but they weren’t really up to the task.”

With MongoDB Atlas on AWS, Peddi says, all these issues are greatly reduced. “We’re able to do everything we need with our fully managed database very quickly – scale according to business need at the press of a button, for example. There are other benefits. With MongoDB technical engineers a phone call away, we’re able to fix issues far quicker than we could in the past. MongoDB Compass, the GUI for the database, is proving helpful in letting our teams visually explore our data and tune things accordingly.”

Migrating to Atlas has also helped Darwinbox dramatically reduce costs.

We’ve optimized our database infrastructure and how we manage backups. Not only did we bring down costs by 40%, but by leveraging the queryable snapshot feature, we’re able to restore the data we actually need 80% faster.

Chaitanya Peddi, Co-founder and Head of Product, DarwinBox

The increased availability and data resilience from the switch to MongoDB Atlas on AWS eases the responsibility in managing the details of 200,000 employees’ working lives. “Data is the most sensitive part of our business, the number one thing that we care about,” says Peddi, “We can’t lose even 0.00001 percent of our data. We used to take snapshots of the database, but that was costly and difficult to manage. Now, it’s more a live copy process. We can guarantee data retention for over a year, and it only takes a few moments to find what you need with MongoDB Atlas.”

For DarwinBox to achieve its target of 10x growth in two years, it has to – and plans to – go international.

“We had that in mind from the outset. We’ve designed our architecture to cope with a much larger scale, both in total employee numbers and client numbers, and to handle different regulatory regimes.” According to Peddi, that means moving to microservices, developing data analytics, maybe even looking at other cloud providers to host the DarwinBox HR Platform. He added: “If we were to do this on AWS and self-manage the database with our current resources, we would have to invest a significant amount of effort into orchestrating and maintaining a globally distributed database. MongoDB Atlas with its cross-region capabilities makes this all much easier.”

Darwinbox is confident that MongoDB Atlas will help the organization achieve its product plans.

“MongoDB Atlas will be able to support the business needs that we've planned out for the next two years.” says Peddi, “We’re happy to see how rapidly the Atlas product roadmap is evolving.”

Get started with MongoDB Atlas and deploy a free database in minutes.

Bienvenue à MongoDB Atlas: MongoDB as a Service Now Available in France

Leo Zheng


En français

MongoDB Atlas, the fully automated cloud database, is now available in France on Amazon Web Services and Microsoft Azure. Located in the Paris area, these newly supported cloud regions will allow organizations using MongoDB Atlas to better serve their customers in and around France. For deployments in AWS EU (Paris), the following instance sizes are supported. MongoDB Atlas deployments in this cloud region will automatically be distributed across three AWS availability zones (AZ), ensuring that the failure of a single AZ will not impact the database’s automated election and failover process. Currently, customers deploying to AWS EU (Paris) can also replicate their data to regions of their choosing (to provide even greater fault tolerance or fast, responsive read access) if they’re using the M80 (low CPU), M200 (low CPU), or M400 (low CPU) instance sizes.

For MongoDB Atlas deployments in Azure France Central, the following instance sizes are supported. Deployments in this cloud region will automatically be distributed across 2 Azure fault domains. Assuming that a customer is deploying a 3-node replica set, 2 of those nodes will be located in 1 fault domain and the last node will live in its own fault domain. While this configuration does have a higher chance of loss of availability in the event that a fault domain goes down, cross-region replication can be configured to withstand fault domain and regional outages and is compatible with any Atlas instance size available in Azure France Central.

MongoDB is certified under the EU-US Privacy Shield, and the MongoDB Cloud Terms of Service now includes GDPR-required data processing terms to help MongoDB Atlas customers prepare for May 25, 2018 when the GDPR becomes enforceable.

MongoDB Atlas in France is open for business now and you can start using it today! Get started here.

MongoDB Atlas, la base de donnée entièrement automatisée dans le cloud, est maintenant disponible en France sur Amazon Web Services et Microsoft Azure. Localisés dans la région Parisienne, ces data centers nouvellement supportés permettront à votre organisation d’utiliser MongoDB Atlas pour répondre au mieux aux besoins de vos clients en France et ses environs. Pour les déploiements Atlas sur AWS EU (Paris), les tailles d’instances suivantes sont supportées. Les déploiements sur Atlas dans cette région du cloud seront automatiquement distribués au travers de trois zones de disponibilités pour assurer qu’une panne dans l’une de ces zones n’impacte pas le système d’élection automatique et le processus de basculement vers un nouveau noeud. Actuellement, les clients d’Atlas qui déploient sur AWS EU (Paris) peuvent aussi répliquer leurs données dans les autres régions de leur choix (pour permettre une encore plus grande résistance à la panne ou pour des accès en lecture plus réactifs et plus rapides) s'ils utilisent les tailles d’instances M80 (CPU faible), M200 (CPU faible), ou M400 (CPU faible).

Pour les déploiements dans “Azure France Central”, les tailles d’instances suivantes sont supportées. Les déploiements Atlas dans cette région du cloud seront automatiquement distribuée dans deux data centers Azure. En supposant qu’un client déploie un replica set de trois noeuds, deux de ces noeuds seront localisés dans un data center et le dernier sera situé dans son propre data center. Bien que cette configuration possède plus de chance de perte de disponibilité dans le cas d’une panne sur un datacenter entier, la réplication au travers de plusieurs régions peut être configurée pour résister à une panne générale d’un datacenter ou à des coupures régionales. Cette réplication inter-régionale est compatible avec n’importe quelle taille d’instance disponible sur Azure France Central.

MongoDB est certifié dans le cadre du Privacy Shield EU-US, et les conditions d'utilisation de MongoDB Cloud incluent désormais les termes de traitement de données requis par GDPR pour aider les clients de MongoDB Atlas à se préparer pour le 25 mai 2018.

MongoDB Atlas in France is open for business now and you can start using it today! Get started here.

Cloud Data Strategies: Preventing Data Black Holes in the Cloud

Leo Zheng


Black holes are regions in spacetime with such strong gravitational pull that nothing can escape. Not entirely destructive as you might have been led to believe, their gravitational effects help drive the formation and evolution of galaxies. In fact, our own Milky Way galaxy orbits a supermassive black hole with 4.1 million times the mass of the Sun. Some theorize that none of us would be here were it not for a black hole.

On the flip side, black holes can also be found hurtling through the cosmos — often at millions of miles per hour — tearing apart everything in their path. It’s said that anything that makes it into their event horizons, the “point of no return”, will never be seen or heard from again, making black holes some of the most interesting and terrifying objects in space.

Why are we going on about black holes, gravitational effects, and points of no return? Because something analogous is happening right now in computing.

First coined in 2010 by Dave McCrory, the concept of “data gravity” treats data as if it were a planet or celestial object with mass. As data accumulates in an environment, applications and services that rely on that data will naturally be pulled into the same environment. The larger the “mass” of data there is, the stronger the “gravitational pull” and the faster this happens. Applications and services each have their own gravity but data gravity is by far the strongest, especially as:

  • The farther away data is, the more drastic the impacts on application performance and user experience. Keeping applications and services physically nearby reduces latency, maximizes throughput, and makes it easier for teams to build performant applications.
  • Moving data around has a cost. In most cases, it makes sense to centralize data to reduce that cost, which is why data tends to amass in one location or environment. Yes, distributed systems do allow organizations to partition data in different ways for specific purposes — for example, fencing sets of data by geographic borders to comply with regulations — but within those partitions, minimal data movement is still desirable.
  • And finally, efforts to digitize business and organizational activities, processes, and models (dubbed by many as “digital transformation” initiatives) succeed or fail based on how effectively data is utilized. If software is the engine by which digital transformation happens, then data is its fuel.

As in the real world, the larger the mass of an object, the harder it is to move, so data gravity also means that once your mass of data gets large enough, it is also harder (and in some cases, near impossible) to move. What makes this relevant now more than ever is the shift to cloud computing. As companies move to the cloud, they need to make a decision that will have massive implications down the line — where and how are they going to store their data? And how do they not let data gravity in the cloud turn into a data black hole?

There are several options for organizations moving from building their own IT to consuming it as a service in the cloud.

Proprietary Tabular (Relational) Databases

The companies behind proprietary tabular databases often penalize their customers for running these technologies on any cloud platform other than their own. This should not surprise any of us. These are the same vendors that for decades have been relying on selling heavy proprietary software with multi-year contracts and annual maintenance fees. Vendor lock-in is nothing new to them.

Organizations choosing to use proprietary tabular databases in the cloud also carry over all the baggage of those technologies and realize few cloud benefits. These databases scale vertically and often cannot take advantage of cloud-native architectures for scale-out and elasticity without massive compromises. If horizontal scale-out of data across multiple instances is available, it isn’t native to the database and requires complex configurations, app-side changes, and additional software.

Lifting and shifting these databases to the cloud does not change the fact that they’re not designed to take advantage of cloud architectures.

Open Source Tabular Databases

Things are a little better with open source tabular databases insofar as there is no vendor enforcing punitive pricing to keep you on their cloud. However, similar to proprietary tabular databases, most of these technologies are designed to scale vertically; scaling out to fully realize cloud elasticity is often managed with fragile configurations or additional software.

Many companies running these databases in the cloud rely on a managed service to reduce their operational overhead. However, feature parity across cloud platforms is nonexistent, making migrations complicated and expensive. For example, databases running on Amazon Aurora leverage Aurora-specific features not found on other clouds.

Proprietary Cloud Databases

With proprietary cloud databases, it’s very easy to get into a situation where data goes in and nothing ever comes out. These database services run only in their parent cloud and often provide very limited database functionality, requiring customers to integrate additional cloud services for anything beyond very simple use cases.

For example, many of the proprietary cloud NoSQL services offer little more than key-value functionality; users often need to pipe data into a cloud data warehouse for more complex queries and analytics. They also tend to be operationally immature, requiring additional integrations and services to address data protection and provide adequate performance visibility. And it doesn’t stop there. New features are often introduced in the form of new services, and before users know it, instead of relying on a single cloud database, they’re dependent on an ever-growing network of cloud services. This makes it all the more difficult to ever get data out.

The major cloud providers know that if they’re able to get your data in one of their proprietary database services, they’ve got you right where they want you. And while some may argue that organizations should actually embrace this new, ultimate form of vendor lock-in to get the most out of the cloud, that doesn’t leave customers with many options if their requirements, or if data regulations, change. What if the cloud provider you’re not using releases a game-changing service you need to edge out your competition? What if they open up a data center in a new geographic region you’ve prioritized and yours doesn’t have it on their roadmap? What if your main customer dictates that you should sever ties with your cloud provider? It’s happened before.

These are all scenarios where you could benefit from using a database that runs the same, everywhere.

The database that runs the same ... everywhere

As you move into the cloud, how you prevent data gravity from turning against you and limiting your flexibility is simple — use a database that runs the same in any environment.

One option to consider is MongoDB. As a database, it combines the flexibility of the document data model with sophisticated querying and indexing required by a wide range of use cases, from simple key-value to real-time aggregations powering analytics.

MongoDB is a distributed database designed for the cloud at its core. Redundancy for resilience, horizontal scaling, and geographic distribution are native to the database and easy to use.

And finally, MongoDB delivers a consistent experience regardless of where it is deployed:

  • For organizations not quite ready to migrate to the cloud, they can deploy MongoDB on premises behind their own firewalls and manage their databases using advanced operational tooling.
  • For those that are ready to migrate to the cloud, MongoDB Atlas delivers the database as a fully managed service across more than 50 regions on AWS, Azure, and Google Cloud Platform. Built-in automation of proven practices helps reduce the number of time-consuming database administration tasks that teams are responsible for, and prevents organizations from migrating their operational overhead into the cloud as well. Of course, if you want to self-manage MongoDB in the cloud, you can do so.
  • And finally, for teams that are well-versed in cloud services, MongoDB Atlas delivers a consistent experience across AWS, Azure, and Google, allowing the development of multi-cloud strategies on a single, unified data platform.

Data gravity will no doubt have a tremendous impact on how your IT resources coalesce and evolve in the cloud. But that doesn’t mean you have to get trapped. Choose a database that delivers a consistent experience across different environments and avoid going past the point of no return.


To learn more about MongoDB, check out our architecture guide.

You can also get started with a free 512 MB database managed by MongoDB Atlas here.

Header image via Paramount

New to MongoDB Atlas — Cloud Provider Snapshots on Azure, Expanded API for Snapshots and Restore Jobs

Leo Zheng

Release Notes, Cloud

One of the core components of MongoDB Atlas, the cloud database service for MongoDB, is the fully managed disaster recovery functionality. With continuous backups, you can take consistent, cluster-wide snapshots of sharded deployments and trigger point-in-time restores to satisfy demanding recovery point objectives (RPOs) from the business. Continuous backups also allow you to query backup snapshots to restore granular data in a fraction of the time it would take to restore an entire snapshot.

Today we’re making it even easier to manage your backups with an expanded Atlas API. Programmatically get metadata about your snapshots, delete them, or change their expiration. Trigger restore jobs and retrieve them. The MongoDB Atlas API allows you to incorporate the rich functionality of Atlas fully managed backups into workflows optimized for how you manage your IT resources.

Visit our documentation for more information.

Cloud Provider Snapshots for Azure

We are also introducing a new type of managed backup service for MongoDB Atlas, using the native snapshot capabilities of your cloud provider. With cloud provider snapshots, your backups will be stored in the same cloud region as your managed databases, granting you better governance over where all of your data lives.

For deployments using cross-region replication, your backups will be stored in your preferred region.

Compared to continuous backups, cloud provider snapshots allow for fast restores of snapshot images. Pricing, which varies slightly from region to region, is also lower.

Cloud provider snapshots are available today for replica sets on Microsoft Azure. Support for Amazon Web Services and Google Cloud Platform will be rolled out later this year.

If you’re considering switching backup methods (from continuous backup to cloud provider snapshots), consider creating a separate project in MongoDB Atlas. For each Atlas project, the first cluster you enable backups for will dictate the backup method for all subsequent clusters in the project. To change the backup method within the same the project, disable backups for all clusters in the project, then re-enable backups using your preferred backup methodology. MongoDB Atlas automatically deletes any stored snapshots when you disable backups for a cluster.

Not yet a MongoDB Atlas user? Create an account and get a free 512 MB database.