Sharding increases count and size of collection, what to do?

Digvijay_Singh_Tomar · January 24, 2023, 10:09am

Hi, I have 3 shards with having 2 replica sets(P+S+S) and one mongos and when i am going to perform sharding on one collection of size 5.6GB and after sharding we observed size and document count started increasing
suppose we have 1000 doc in shard 1 then after sharding when we connect to mongo shell count of doc showing 1500 and we found size is also consumed more than previous one
Do we have any solution for that?

Eamon_Scullion · January 25, 2023, 10:31pm

The increased document count that you observe after sharding a collection is related to chunk balancing and orphaned dociments.

As your sharded cluster starts to balance chunks, documents will be moved between shards, for example, from shard 0 to shard 1.

A copy of documents from shard 0 will be moved over to shard 1, and the original documents on shard 0 will be marked for deletion, also known as orphaned documents.

Depending on your cluster tier, and how busy it is, these orphaned documents will be removed in the future. Sometimes this can happen around 24 hours after the chunk migration.

You can read more about orphaned documents here:

Note that the orphaned document clean up is automated, so there’s no need to intervene.

If orphaned documents are having a negative effect on your cluster, you can try pausing the balancer, or setting a balancing window to throttle how many orphaned documents are being produced.

Sumanta_Mukhopadhyay · February 3, 2023, 5:50pm

Sharding in MongoDB is a process of distributing data across multiple servers, which can improve performance and scalability. However, it’s possible to experience a growth in size and document count after sharding, as well as increased memory usage, as the database must manage more metadata.

To address this issue, there are several steps you can take:

Monitor your database: Keep an eye on the size of your collections, the number of documents in each, and the amount of memory being used. Regular monitoring will help you identify any potential issues early on.
Rebalance your shards: You can use the sh.rebalance method to redistribute the data across your shards. This can help you achieve a more even distribution of data and reduce the amount of metadata required to manage the shards.
Optimize your indexes: Indexes are crucial for efficient querying, but they also consume memory. Make sure your indexes are optimized for the queries you’re running and remove any unused or redundant indexes.
Consider vertical partitioning: If you have collections with a large number of fields, consider splitting the collection into two or more smaller collections, each with a smaller subset of fields.
Monitor your memory usage: Make sure you have enough memory to support the number of shards and replicas you have. If you’re experiencing memory constraints, consider adding more memory or reducing the number of replicas.

It’s important to keep in mind that sharding can be a complex process, and it may require some trial and error to find the best configuration for your specific use case. If you’re still having issues, you may want to consider reaching out to the MongoDB community or professional services for additional guidance.

Here’s an example of how you can resolve the issue of increased size and document count after sharding in MongoDB:

Suppose you have a collection called “orders” that you want to shard. The collection is 5.6GB in size and contains 1000 documents. After sharding, you observe that the count of documents has increased to 1500 and the size of the collection has also grown.

To resolve this issue, you can follow these steps:

Monitor your database: Connect to the mongos and run the following command to check the size and document count of the “orders” collection:

> use mydb
> db.orders.stats()

Rebalance your shards: Run the following command to redistribute the data across your shards:

> sh.rebalance("mydb.orders")

This will redistribute the data evenly across the shards, reducing the metadata overhead and the size of the collection.

Optimize your indexes: Check the indexes on the “orders” collection and make sure they’re optimized for your queries. You can run the following command to see the indexes on the collection:

> db.orders.getIndexes()

If you have any unused or redundant indexes, you can remove them by running the following command:

> db.orders.dropIndex("indexName")

Consider vertical partitioning: If you have a collection with a large number of fields, you can consider splitting it into smaller collections, each with a smaller subset of fields. For example, you could create two collections, “orders” and “orderDetails”, with the latter containing the details of each order.
Monitor your memory usage: Regularly monitor your memory usage to make sure you have enough memory to support your shards and replicas. You can use the db.serverStatus() method to get information about your server’s memory usage.

By following these steps, you can resolve the issue of increased size and document count after sharding in MongoDB and maintain the performance and scalability of your database.