Scale Out Without Fear or Friction: Live Resharding in MongoDB
Live resharding was one of the key enhancements delivered in our MongoDB 5.0 Major Release . With live resharding you can change the shard key for your collection on demand as your application evolves with no database downtime or complex data migrations . In this blog post, we will be covering: Product developments that have made sharding more flexible What you had to do before MongoDB 5.0 to reshard your collection, and how that changed with 5.0 live resharding Guidance on the performance and operational considerations of using live resharding Before that, we should discuss why you should shard at all, and the importance of selecting a good shard key – even though you have the flexibility with live resharding to change it at any time. Go ahead and skip the next couple of sections if you are already familiar with sharding! Why Shard your Database? Sharding enables you to distribute your data across multiple nodes. You do that to: Scale out horizontally — accommodate growing data or application load by sharding once your application starts to get close to the capacity limits of a single replica set. Enforce data locality — for example pinning data to shards that are provisioned in specific regions so that the database delivers low latency local access and maintains data sovereignty for regulatory compliance. Sharding is the best way of scaling databases and MongoDB was developed to support sharding natively. Sharding MongoDB is transparent to your applications and it’s elastic so you can add and remove shards at any time. The Importance of Selecting a Good Shard Key MongoDB’s native sharding has always been highly flexible — you can select any field or combination of fields in your documents to shard on. This means you can select a shard key that is best suited to your application’s requirements. The choice of shard key is important as it defines how data is distributed across the available shards. Ideally you want to select a shard key that: Gives you low latency and high throughput reads and writes by matching data distribution to your application’s data access patterns. Evenly distributes data across the cluster so you avoid any one shard taking most of the load (i.e., a “hot shard”). Provides linear scalability as you add more shards in the future. While you have the flexibility to select any field(s) of your documents as your shard key, it was previously difficult to change the shard key later on. This made some developers fearful of sharding. If you chose a shard key that doesn’t work well, or if application requirements change and the shard key doesn’t work well for its changed access patterns, the impact on performance could be significant. At this point in time, no other mainstream distributed database allows users to change shard keys, but we wanted to give users this ability. Making Shard Keys More Flexible Over the past few releases, MongoDB engineers have been working to provide more sharding flexibility to users: MongoDB 4.2 introduced the ability to modify a shard key’s value . Under the covers the modification process uses a distributed, multi-document ACID transaction to change the placement of a document in a sharded cluster. This is useful when you want to rehome a document to a different geographic region or age data out to a slower storage tier . MongoDB 4.4 went further with the ability to refine the shard key for a collection by adding a suffix to an existing key. Both of these enhancements made sharding more flexible, but they didn’t help if you needed to reshard your collection using an entirely different shard key. Manual Resharding: Before MongoDB 5.0 Resharding a collection was a manual and complex process that could only be achieved through one of two approaches: Dumping the entire collection and then reloading it into a new collection with the new shard key . This is an offline process, and so your application is down until data reloading is complete — for example, it could take several days to dump and reload a 10 TB+ collection on a three-shard cluster. Undergoing a custom migration that involved writing all the data from the old cluster to a new cluster with the resharded collection. You had to write the query routing and migration logic, and then constantly check the migration progress to ensure all data had been successfully migrated. Custom migrations entail less downtime, but they come with a lot of overhead. They are highly complex, labor-intensive, risky, and expensive (as you had to run two clusters side-by-side). It took one MongoDB user three months to complete the live migration of 10 billion documents. How this Changed with MongoDB 5.0: Live Resharding We made manual resharding a thing of the past with MongoDB 5.0. With 5.0 you just run the reshardCollection command from the shell, point at the database and collection you want to reshard, specify the new shard key, and let MongoDB take care of the rest. reshardCollection: "<database>.<collection>", key: <shardkey> When you invoke the reshardCollection command, MongoDB clones your existing collection into a new collection with the new shard key, then starts applying all new oplog updates from the existing collection to the new collection. This enables the database to keep pace with incoming application writes. When all oplog updates have been applied, MongoDB will automatically cut over to the new collection and remove the old collection in the background. Lets walk through an example where live resharding would really help a user: The user has an orders collection. In the past, they needed to scale out and chose the order_id field as the shard key. Now they realize that they have to regularly query each customer’s orders to quickly display order history. This query does not use the order_id field. To return the results for such a query, all shards need to provide data for the query. This is called a scatter-gather query. It would have been more performant and scalable to have orders for each customer localized to a shard, avoiding scatter-gather, cross-shard queries. They realize that the optimal shard key would be "customer_id: 1, order_id: 1" rather than just the order_id . With MongoDB 5.0’s live resharding, the user can just run the reshard command, and MongoDB will reshard the orders collection for them using the new shard key, without having to bring the database and the application down. Watch our short Live Resharding talk from MongoDB.Live 2021 to see a demo with this exact example. Not only can you change the field(s) for a shard key, you can also review your sharding strategy, changing between range, hash, and zones. Live Resharding: Performance and Operational Considerations Even with the flexibility that live resharding gives you, it is still important to properly evaluate the selection of your shard key. Our documentation provides guidance to help you make the best choice of shard key . Of course, live resharding makes it much easier to change that key should your original choice have not been optimal, or if your application changes in a way that you hadn’t previously anticipated. If you find yourself in this situation, it is essential to plan for live resharding. What do you need to be thinking about before resharding Make sure you have sufficient storage capacity available on each node of your cluster. Since MongoDB is temporarily cloning your existing collection, spare storage capacity needs to be at least 1.2x the size of the collection you are going to reshard. This is because we need 20% more storage in order to buffer writes that occur during the resharding process. For example, if the size of the collection you want to reshard is 2 TB compressed, you should have at least 2.4 TB of free storage in the cluster before starting the resharding operation. While the resharding process is efficient, it will still consume additional compute and I/O resources. You should therefore make sure you are not consistently running the database at or close to peak system utilization. If you see CPU usage in excess of 80% or I/O usage above 50%, you should scale up your cluster to larger instance sizes before resharding. Once resharding is done, it's fine to scale back down to regular instance sizes. Before you run resharding, you should update any queries that reference the existing shard key to include both the current shard key and the new shard key. When resharding is complete, you can remove the old shard key from your queries. Review the resharding requirements documentation for a full run down on the key factors to consider before resharding your collection. What should you expect during resharding? Total duration of the resharding process is dependent on the number of shards, the size of your collection, and the write load to your collection. For a constant data size, the more shards the shorter the resharding duration. From a simple POC on MongoDB Atlas, a 100 GB collection took just 2 hours 45 minutes to reshard on a 4-shard cluster and 5 hours 30 minutes on a 2-shard cluster. The process scales up and down linearly with data size and number of shards – so a 1 TB collection will take 10 times longer to reshard than a 100GB collection. Of course your mileage may vary based on the read/write ratio of your application along with the speed and quality of your underlying hardware infrastructure. While resharding is in flight, you should expect the following impacts to application performance: The latency and throughput of reads against the collection that is being resharded will be unaffected . Even though we are writing to the existing collection and then applying oplog entries to both its replicas and to the cloned collection, you should expect to see negligible impact to write latency given enough spare CPU. If your cluster is CPU-bound, expect a latency increase of 5 to 10% during the cloning phase and 20 to 50% during the applying phase (*) . As long as you meet the aforementioned capacity requirements, the latency and throughput of operations to other collections in the database won't be impacted . (*) Note: If you notice unacceptable write latencies to your collection, we recommend you stop resharding, increase your shard instance sizes, and then run resharding again. The abort and cleanup of the cloned collection are instantaneous. If your application has time periods with less traffic, reshard your collection during that time if possible. All of your existing isolation, consistency, and durability guarantees are honored while resharding is running. The process itself is resilient and crash-safe, so if any shard undergoes a replica set election, there is no impact to resharding – it will simply resume when the new primary has been elected. You can monitor the resharding progress with the $currentOp pipeline stage. It will report an estimate of the remaining time to complete the resharding operation. You can also abort the resharding process at any time. What happens after resharding is complete? When resharding is done and the two collections are in sync, MongoDB will automatically cut over to the new collection and remove the old collection for you, reclaiming your storage and returning latency back to normal. By default, cutover takes up to two seconds — during which time the collection will not accept writes, and so your application will see a short spike in write latency. Any writes that timeout are automatically retried by our drivers , so exceptions are not surfaced to your users. The cutover interval is tunable: Resharding will be quicker if you raise the interval above the two second default, with the trade-off that the period of write unavailability will be longer. By dialing it down below two seconds, the interval of write unavailability will be shorter. However, the resharding process will take longer to complete, and the odds of the window ever being short enough to cutover will be diminished. You can block writes early to force resharding to complete by issuing the commitReshardCollection command. This is useful if the current time estimate to complete the resharding operation is an acceptable duration for your collection to block writes. What you Get with Live Resharding Live sharding is available wherever you run MongoDB – whether that’s in our fully managed Atlas application data platform in the cloud , with Enterprise Advanced , or if using the Community Edition of MongoDB. To recap how you benefit from live resharding: Evolve with your apps with simplicity and resilience: As your applications evolve or as you need to improve on the original choice of shard key, a single command kicks off resharding. This process is automated, resilient, and non-disruptive to your application. Compress weeks/months to minutes/hours: Live resharding is fully automated, so you eliminate disruptive and lengthy manual data migrations. To make scaling out even easier, you can evaluate the effectiveness of different shard keys in dev/test environments before committing your choice to production. Even then, you can change your shard key when you want to. Extend flexibility and agility across every layer of your application stack: You have seen how MongoDB’s flexible document data model instantly adapts as you add new features to your app. With live resharding you get that same flexibility when you shard. New features or new requirements? Simply reshard as and when you need to. Summary Live Resharding is a huge step forward in the state of distributed systems, and is just the start of an exciting and fast-paced MongoDB roadmap that will make sharding even easier, more flexible, and automated. If you want to dig deeper, please take a look at the Live Resharding session recording from our developer conference and review the resharding documentation . To learn more about MongoDB 5.0 and our new Rapid Releases, download our guide to what’s new in MongoDB .
Default Majority Write Concern: Providing Stronger Durability Guarantees "Out of the Box"
MongoDB has always provided developers with the flexibility to control the desired levels of write durability, enabling them to balance latency and throughput against the application’s SLAs. The chosen write concern dictates how many nodes in the MongoDB database cluster need to acknowledge the write before success is passed back to the application. You can configure the write concern either in the driver or via a cluster-wide, global setting in the server. Prior to the 5.0 release, MongoDB defaulted to the w:1 write concern, which waits for the write to be applied to memory on the primary node before acknowledging success back to the application. This default was adequate to meet the durability requirements of the most common applications built on MongoDB. Over the past several years, this application profile has extended as the MongoDB database has evolved, especially in serving more mission-critical, transactional applications. At the same time, we have made replication faster – so enforcing data durability across multiple nodes in a distributed database cluster no longer imposes the performance trade-offs of the past. It is for these reasons that starting with MongoDB 5.0 , the default durability guarantee has been elevated to the majority (w:majority) write concern. This means that write success will now only be acknowledged to the application once it has been committed and persisted to disk on a majority of replicas . Choosing the new default vs. the former w:1 default allows for a stronger durability guarantee where acknowledged data can survive replica set elections and complete node failures. The new w:majority default setting is fully tunable, so you can maintain the earlier w:1 default or any custom write concern you had previously configured. In this blog post, I will dig into why we have made the decision to move to w:majority as the new default, and explain how it works as you upgrade your existing MongoDB cluster and deploy a new MongoDB environment. Figure 1: MongoDB w:majority providing multi-node durability by default w:1 to w:majority default concern: A decision rooted in key MongoDB milestones The decision of making w:majority the default “out of the box” write concern is deeply rooted in our technology evolution and the experience that comes from the increased sophistication of our user’s workloads. With the release of MongoDB 4.0 (June 2018), we added support for multi-document ACID transactions . This made it extremely easy for developers to address a complete new range of modern systems of record applications with MongoDB. Transactional guarantees were initially scoped to replica sets and then extended 12 months later to sharded clusters with the MongoDB 4.2 release. In May 2019, the MongoDB Atlas default connection string used by the drivers to connect to the database was changed to w:majority. This effectively eliminated any manual tuning by Atlas users who required a higher level of durability guarantees. It also enabled us to evaluate whether users maintained those stronger guarantees, or dialed them down to the w: 1 primary acknowledged concern. The vast majority of users stuck with the stronger write concern. In MongoDB 4.4 (June 2020), we radically improved replication performance with the introduction of streaming replication and “replicate-before-journaling”. Rather than replicas polling the primary and receiving batches of events to apply locally, streaming replication allows the primary to continuously stream messages to the replicas. Coupled with replicate-before-journaling, which allows secondary nodes to read a primary node's oplog entries prior to them being locally journaled, these changes see the latency of majority committed writes reduced by up to 50% over high load networks. Throughout the years our product and engineering teams have studied the benefits of all these innovations and specifically of w:majority to our user base. One fundamental principle of w:majority is that the time taken to perform writes server-side does not really change. The added time it takes for the driver to acknowledge the write operation on the client-side strengthens the durability the application can expect for each write and improves the overall health and performance of the MongoDB cluster. This new MongoDB 5.0 w:majority default is just a reflection of how users have extended their utilization of MongoDB to modernize mission-critical applications. Upgrade considerations For users who do not set any write concern and instead rely on the defaults that MongoDB provides, w:majority will become the default write concern starting in MongoDB 5.0. This new default will take effect without any user action. The new default does not override any explicit write concerns that developers have configured previously. Developers who want their application to continue to use the write concern w:1 should explicitly set their default write concern to w:1 prior to upgrade, whether in their driver or server-side with a global write concern to maintain the previous behavior. Special considerations for MongoDB clusters with arbiters First, we should remind users that as a general practice we recommend against using arbiters in production clusters. For MongoDB clusters that do use arbiters, if the loss of a single data holding node would leave majority writes unable to succeed, then MongoDB 5.0 defaults the write concern to w:1, otherwise, it defaults to w:majority. This translates to a default write concern of w:1 for primary-secondary-arbiter (PSA) configurations , but a default write concern of w:majority for cluster configurations like PSSSA. This prevents users with configurations like PSA from experiencing write unavailability in cases where one data-bearing node is unavailable. Latency and throughput When comparing latency and throughput of w:1 and w:majority there are many application and environment-specific factors that will determine impacts to performance. With the replication enhancements delivered in MongoDB 4.4, users upgrading from earlier MongoDB releases will see little to no latency impact of the higher durability guarantees in MongoDB 5.0. Ensuring you have sufficient client concurrency should yield little to no impact on overall throughput. We will go over those factors and share the results of our performance tests in a future blog post, however, the best way to evaluate latency and throughput of different write concerns is to test in your own environment. Getting started with MongoDB 5.0 We encourage all MongoDB users to evaluate our new 5.0 release to experience the benefits of the new w:majority default and everything MongoDB 5.0 offers.