DocumentDB, MongoDB and the Real-World Effects of Compatibility
July 16, 2021 | Updated: January 27, 2023
If there’s confusion in the market for document databases, it probably has to do with how the products are marketed. AWS claims that DocumentDB, its document model database, comes “with MongoDB compatibility.” But the question of how compatible DocumentDB actually is with MongoDB is worth considering.
DocumentDB merely emulates the MongoDB API while running on top of AWS’s cloud-based relational database, Amazon Aurora. And it’s an inconsistent imitator at best, because it fails 62% of MongoDB API correctness tests.
Even though AWS claims compatibility with MongoDB 4.0, our tests have concluded that its emulator is a mishmash of features going back to MongoDB 3.2, which we released in 2015. The result is that DocumentDB lacks many of the features that come standard in MongoDB.
We’ve already published a side-by-side comparison of the feature sets for each solution. Instead of covering the same ground here, we'll explain how some of those differences play out in real-world scenarios.
Scaling writes, partitioning data, and sharding
Native sharding enables you to scale out databases horizontally, across multiple nodes and regions. Atlas offers elastic vertical and horizontal scaling to smooth consumption. DocumentDB does not scale writes or partition data beyond a single node. In order to ensure consistency, MongoDB uses concurrency control measures to prevent multiple clients from modifying the same piece of data simultaneously.
Replicate and scale beyond a single region
A number of factors are driving the need to distribute workloads to different geographic regions. In some cases, it’s to reduce latency by putting data closer to where it’s being used. In other cases, it’s to store data in a specific geographic zone to help meet data localization requirements. Finally, there’s the need to ensure the availability of data when there’s an outage of an entire AWS region. The flexibility to replicate and move workloads as needed is increasingly seen as a business requirement. But by default DocumentDB limits you to just 15 replicas and constrains you to a single region. Newly introduced Global Clusters may look like an answer, but much like “MongoDB compatibility,” it’s potentially misleading. The Global Clusters feature more closely resembles multi-region replication since it only allows writes to single primaries instead of being able to write to multiple regions. It also requires manual reconfiguration to recover from failures, making it a partial solution, at best.
MongoDB Atlas allows true global cluster configurations so you can deliver capabilities to all your users around the world. At a click of a button, you can place the most relevant data near local application servers across more than 80 global regions to ensure low-latency reads and writes. By being able to define a geographic location for each document, your teams are able to more easily meet local privacy and compliance measures. It’s also an insurance policy against being locked into a single public cloud provider.
High resilience, rapid failover, retryable writes
For critical applications, every second of downtime represents a loss of revenue, trust, and reputation. Rapid failover to a different geographic area is necessary when recovery time objectives (RTO) are measured in seconds. DocumentDB failover SLAs can be as high as two minutes, and multi-region failover is not available. With MongoDB, failover time is typically five seconds, and failover to a different region or cloud provider are also options.
Write errors can be as costly as downtime. If a write to increment a field is duplicated because a dropped connection failed to notify the client that the write was executed, that extra increment can be very costly depending on what it represents. With retryable writes, a write can be sent multiple times but applied exactly once. MongoDB has retryable writes. DocumentDB doesn’t.
Integrated text search, geospatial processing, graph traversals
Integrated text search saves time and improves performance because you can run queries across multiple sources. With DocumentDB, data must be replicated to adjacent AWS services, which increases cost and complexity. MongoDB Atlas combines integrated text search, graph traversals, and geospatial processing features into a single API and platform. Integrated search with MongoDB Atlas helps drive end user behavior by serving up relevant results based on what users are looking for or what businesses want to direct them toward.
Geographically distributed replica sets can also be used to scale read operations and intelligently route queries to the replica set that’s closest to the user. Hedged reads is a function that automatically routes queries to the two closest nodes (measured by ping distance), returning results from the fastest replica. This helps minimize situations where queries are waiting on a node that’s already busy. DocumentDB doesn’t offer hedged reads, and it’s more restricted in terms of the number of replica sets it allows and the ability to place workloads in different regions. MongoDB gives you more flexibility when distributing data geographically for hedged reads since it leverages all of the major public cloud providers.
Putting data in cold storage can be a death knell if accessing it again is too cumbersome or slow. With online archiving, you can tier data across fully managed databases and cloud object storage and query it through a single endpoint. Online archiving automatically archives historical data while reducing operational and transactional data storage costs without compromising on query performance. MongoDB has it. DocumentDB doesn’t.
Integrated querying in the cloud
Running separate queries for separate data stores can drain resources and slow queries. The best solution is being able to query and analyze data across all the different databases and storage containers at once. You can do this with integrated querying, where you run a single query to analyze live cloud data and historical data together and in-place for faster insights. With DocumentDB, you have to replicate data to adjacent AWS services. With MongoDB, you can query and analyze data across cloud datastores and MongoDB Atlas in its native format. You can also run powerful, easy-to-understand aggregations through a unified API for a consistent experience across data types.
On-demand materialized views
When you create aggregations, the results are usually put into a new collection every time you create it. The entire collection is regenerated each time you create the aggregation. This process consumes CPU and I/O. With the $merge stage, you can just update the generated results collection rather than rebuild it completely. $merge lets you incrementally update the collection every time you run it. To update it, all you need to do is run the aggregation again and it will update all the values in place. $merge gives you the ability to create collections based on an aggregation and update those collections efficiently. This functionality allows users to create on-demand materialized views, where the content of the output collection is incrementally updated when the pipeline is run. MongoDB has this capability. DocumentDB does not.
Rich data types
The decimal data type is critical for storing very large or small numbers, like financial and tax computations, where it’s necessary to emulate decimal rounding exactly. DocumentDB does not support decimal data types or, in turn, lossless processing of complex numeric data, which is a problem for financial and scientific applications. MongoDB does support rich data types like Decimal128, giving you 128 bits of high precision decimal representation.
Client-side field-level encryption
Client-side field-level encryption (FLE) reduces the risk of unauthorized access or disclosure of sensitive data, like personally identifiable information (PII) and protected health information (PHI). Fields are encrypted before they leave the application, which protects data while in transit over the network, in database memory, at-rest in storage, in backup repositories, and in system logs. DocumentDB does not offer client-side FLE. MongoDB’s client-side FLE provides among the strongest levels of data privacy and security for regulated workloads.
In addition to the feature sets described here, one of the biggest differences between DocumentDB and MongoDB is the degree of freedom you have to move between different platforms. AWS offers seamless movement and minimal friction between services within its own ecosystem. MongoDB makes it easy to replicate data or move workloads to any cloud provider, giving you complete flexibility within the AWS platform as well as outside of it — whether it’s a self-managed MongoDB instance on cloud infrastructure, a full on-premises deployment, or just a local development instance on an engineer’s laptop.