Performance Best Practices: Sharding

Mat Keep and Henrik Ingo
February 18, 2020 | Updated: May 7, 2021

Welcome to the fourth in a series of blog posts covering performance best practices for MongoDB.

In this series, we are covering key considerations for achieving performance at scale across a number of important dimensions, including:

Data modeling and sizing memory (the working set)
Query patterns and profiling
Indexing
Sharding, which we’ll cover today
Transactions and read/write concerns
Hardware and OS configuration
Benchmarking

Sharding for Horizontal Scale Out

Through sharding, you can automatically scale your MongoDB database out across multiple nodes and regions to handle write-intensive workloads and growing data sizes, and for data residency.

Sharding with MongoDB allows you to seamlessly scale the database as your applications grow beyond the hardware limits of a single server, and it does so without adding complexity to the application.

To respond to changing workload demand, documents can be migrated between shards, and nodes can be added or removed from the cluster at any time – MongoDB will automatically rebalance the data as needed without manual intervention.

Sharding Strategies in MongoDB

By simply hashing a primary key value, most distributed databases randomly spray data across a cluster of nodes. This imposes performance penalties when data is queried across nodes and adds application complexity when data needs to be localized to a specific region.

By exposing multiple sharding policies, MongoDB offers you a better approach. Data can be distributed according to query patterns or data placement requirements, giving you much higher scalability across a diverse set of workloads:

Ranged Sharding. Documents are partitioned across shards according to the shard key value. Documents with shard key values close to one another are likely to be co-located on the same shard. This approach is well suited for applications that need to optimize range based queries, such as co-locating data for all customers in a specific region on a specific shard.
Hashed Sharding. Documents are distributed according to an MD5 hash of the shard key value. This approach guarantees a uniform distribution of writes across shards, which is often optimal for ingesting streams of time-series and event data.
Zoned Sharding. Provides the ability for developers to define specific rules governing data placement in a sharded cluster.

Global Clusters in MongoDB Atlas – the fully managed cloud database service – allows you to quickly implement zoned sharding using a visual UI or the Atlas API. You can easily create distributed databases to support geographically distributed apps, with policies enforcing data residence within specific regions.

Figure 1: Serving always-on, globally distributed, write-everywhere apps with MongoDB Atlas Global Clusters

To ensure you get the full benefit of sharding, there are a set of best practices you should observe.

Ensure a Uniform Distribution of Shard Keys

When shard keys are not uniformly distributed for reads and writes, operations may be limited by the capacity of a single shard. When shard keys are uniformly distributed, no single shard will limit the capacity of the system.

You can learn more about shard keys from the documentation

Avoid Scatter-Gather Queries for Operational Workloads

In sharded systems, queries that cannot be routed based on shard key must be broadcast to all shards for evaluation. Because these queries involve multiple shards for each request they do not scale linearly as more shards are added, and extra overhead is incurred to combine results from multiple shards. You should include the shard key in your query to avoid scatter-gather queries.

The exception to this rule is for large aggregation queries. In these cases, scatter-gather can be a useful approach as it allows the query to run in parallel on all shards.

Use Hashed-Based Sharding When Appropriate

For applications that issue range-based queries, ranged sharding is beneficial because operations can be routed to the fewest shards necessary, usually a single shard. However, ranged sharding requires a good understanding of your data and query patterns, which in some cases may not be practical.

Hashed sharding ensures a uniform distribution of reads and writes, but it does not provide efficient range-based operations.

Pre-Split and Distribute Chunks

When creating a new sharded collection to load data into, first create empty chunks and distribute them evenly on all shards, and then load the data. For hash-based sharding, you can use numInitialChunks to do this automatically.

What’s Next

That wraps up this installment of the performance best practices series. Next up: transactions.

Read more about the basics of MongoDB sharding.

← Previous

Five security principles developers must follow

Find out about the principles that developers should be following to secure their applications.

February 18, 2020

Next →

That’s a Wrap: MongoDB’s 2025 in Review & 2026 Predictions

It’s nearly the end of the year—again! That means it’s time for an end-of-year blog post that expresses disbelief at the passage of time. Which, as the saying goes, flies when you’re having fun. And definitely when you’re as busy as MongoDB was in 2025. It was a big year for the company—and more importantly, for the tens of thousands of customers and millions of developers who rely on MongoDB’s modern data platform for their most mission-critical workloads. At MongoDB, everything we do starts with our obsession with customers and their needs, and if there’s a theme to MongoDB’s 2025, it was (and will continue to be) enabling customer innovation and helping them succeed in the AI era. So here are a few highlights of how MongoDB acted on behalf of customers in 2025. From the acquisition of Voyage AI to customer success across industries, a lot happened in 2025. Let’s go!* *Read to the end for 2026 thoughts. 2025: The (MongoDB) year that was Voyage AI, modernization, and search In February, MongoDB announced the acquisition of Voyage AI, a pioneer in embedding and reranking models, to enhance the accuracy of AI applications. Integrating Voyage AI's advanced retrieval technology with MongoDB’s modern, AI-ready data platform addresses a critical challenge: LLM model hallucinations caused by a lack of context. By improving retrieval accuracy for specialized domains like finance and law, the integration enables businesses to deploy AI for mission-critical use cases. To learn more, see the MongoDB Voyage AI page. Then, in September, we launched MongoDB AMP, an AI-powered Application Modernization Platform. AMP is designed to accelerate the transformation of legacy applications through a combination of AI-powered tooling, a proven delivery framework, and expert guidance (tools, techniques, and talent) to help enterprises reduce technical debt and modernize 2-3 times faster. Want more? Sure you do! Check out this short video. MongoDB also announced the addition of search and vector search capabilities to MongoDB Community Edition and MongoDB Enterprise Server. This allows developers to build and test AI-native applications, including those using retrieval-augmented generation (RAG), in local or on-premises environments. Previously exclusive to MongoDB Atlas, these features enable secure, hybrid deployments where sensitive data can remain on-premises while still leveraging advanced search tools. Here’s a (slightly less short) video about search and vector search on Enterprise Server. Growing and scaling with MongoDB As noted, everything we do at MongoDB starts with our obsession with customers. 2025 was another banner year for customer success and innovation—we were inspired by what organizations of every shape and size, across industries and geographies, built with MongoDB in 2025. Here are just two of the many stories our customers shared in 2025; much more can be found in my colleague Katie Palmer’s blog series, Innovating with MongoDB. Factory By combining the Atlas modern data platform with Voyage AI’s high-performance embeddings, the AI-native startup Factory—which uses AI agents called Droids to accelerate software development lifecycles for organizations—consolidated its fragmented tech stack. This enabled superior code retrieval, simplified operations, and provided the scalability needed to process billions of tokens daily. McKesson McKesson, a global pharmaceutical distributor, replaced its monolithic legacy infrastructure with MongoDB Atlas to meet strict drug tracing mandates. By adopting our modern cloud data platform, McKesson scaled its operations 300x, managing tracking data for 1.2 billion containers annually without latency, and ensuring compliance and patient safety while reducing developer complexity. For more, check out the video of McKesson at MongoDB.local NYC from September. From niche NoSQL to enterprise powerhouse As senior MongoDB engineer and Technical Fellow Ashish Kumar put it earlier this year, “through a sustained and deliberate engineering effort,” MongoDB has gone from a (seemingly) niche NoSQL solution to a trusted enterprise standard, and now delivers “the high availability, tunable consistency, ACID transactions, and robust security that enterprises demand.” A new era of leadership The face of MongoDB has also changed—our CFO, Mike Berry, joined the company in April, and Dev Ittycheria stepped down as CEO in November, after more than 11 years leading the company (including its 2017 IPO). In a LinkedIn post about his role, new MongoDB CEO CJ Desai noted that the company is “at the forefront of a new data revolution, unlocking the next wave of productivity and intelligence.” “Having spent my career building and scaling technology platforms, I’ve always been drawn to companies defined by clarity of vision, relentless organic innovation, and a customer-first culture. MongoDB exemplifies all three,” said Desai. We couldn’t agree more. Onward! Reading the 2026 tea leaves So what might 2026 bring (for MongoDB and tech at large)? Here are a handful of our leaders’ predictions: “As much as people want to talk about Artificial General Intelligence (AGI), we’re still in the phase where most AI use cases automate redundant tasks but benefit from human-in-the-loop checks. Organizations that use AI to complete work that historically is a drain on human resources—but then uses people to carefully verify what AI builds, apply governance frameworks, and maintain accountability across the data lifecycle—will be more successful.” —Pete Johnson, Field CTO, AI, MongoDB “After years of inflated expectations and unsustainable spending, the AI industry is trapped in a bubble where companies reflexively attempt to deploy LLMs at every problem, driving up costs with minimal to no return. Businesses that break free from this spending cycle are the ones that understand the need to ground LLM responses in factual data and learn from prior mistakes. We believe the best way to do this will be with highly accurate embedding models and rerankers for reliable data retrieval.” —Frank Liu, Staff Product Manager, MongoDB "In 2026, cloud independence will evolve from strategic preference to existential imperative across enterprises of every scale. The outages and disruptions of recent years have exposed a fundamental truth: in an always-on digital economy—where commerce, mobility, governance, and even public safety depend on uninterrupted access to cloud services—single-provider reliance is no longer a calculated risk, but a systemic vulnerability. Compounding this is the inexorable rise of data sovereignty. Regulatory regimes worldwide now demand precise jurisdictional control over data residency, rendering rigid cloud commitments incompatible with compliance at global scale. The defining competitive advantage will belong to organizations that transcend fragile prevention theater and engineer true infrastructural resilience: architectures inherently portable, data frictionlessly mobile, and operations autonomously sustained across heterogeneous clouds through AI-orchestrated redundancy. In short, the winners will not merely mitigate downtime—they will design systems that render the concept obsolete." —Ben Cefalo, SVP, Head of Core Products, MongoDB Happy holidays and happy New Year, everyone!

December 22, 2025