GridFS Sharding unbalanced

Hello,

I am facing an issue for days now regarding balancing of a gridfs collection.

As ou can see on the capture the storage is not balanced correctly between shards over time :

I recreated my database to reset the sharding because I suspected a problem that disabled the shard but the same problem occurs.

Now I am checking again and I see that nothing is sharded from the shell :

mongos> db.getSiblingDB(“config”).chunks.find({ ns: “n_storage.fs.chunks” }).count()
0

mongos> use config
switched to db config
mongos> db.collections.find({“_id”: “n_storage.fs.chunks”})
{ “_id” : “n_storage.fs.chunks”, “lastmodEpoch” : ObjectId(“67a6a46f6cdb3b52cc4282d4”), “lastmod” : ISODate(“2025-02-08T00:25:19.765Z”), “timestamp” : Timestamp(1738974319, 12), “uuid” : UUID(“b8166ae7-5a88-4024-8893-0ab8c3973218”), “key” : { “files_id” : 1, “n” : 1 }, “unique” : false, “chunksAlreadySplitForDowngrade” : false, “noBalance” : false }
mongos>

I am confused about this issue because data seems splited, but sharding seems not correctly performed,

Thanks for your help

Hi Anon,

Could you share the output of:

db.aggregate([
   { $shardedDataDistribution: { } },
   { $match: {  "ns": "n_storage.fs.chunks" } }
])

Here is the output :

mongos> db.aggregate([ { $shardedDataDistribution: { } }, { $match: { “ns”: “n_storage.fs.chunks” } } ]).pretty()
{
“ns” : “n_storage.fs.chunks”,
“shards” : [
{
“shardName” : “cold_shard_7”,
“numOrphanedDocs” : 0,
“numOwnedDocuments” : 53378,
“ownedSizeBytes” : NumberLong(“11952508516”),
“orphanedSizeBytes” : 0
},
{
“shardName” : “cold_shard_1”,
“numOrphanedDocs” : 0,
“numOwnedDocuments” : 53410,
“ownedSizeBytes” : NumberLong(“11942422590”),
“orphanedSizeBytes” : 0
},
{
“shardName” : “cold_shard_3”,
“numOrphanedDocs” : 0,
“numOwnedDocuments” : 53395,
“ownedSizeBytes” : NumberLong(“12000846620”),
“orphanedSizeBytes” : 0
},
{
“shardName” : “cold_shard_4”,
“numOrphanedDocs” : 0,
“numOwnedDocuments” : 53352,
“ownedSizeBytes” : NumberLong(“12024046944”),
“orphanedSizeBytes” : 0
},
{
“shardName” : “cold_shard_6”,
“numOrphanedDocs” : 0,
“numOwnedDocuments” : 52786,
“ownedSizeBytes” : NumberLong(“11947160952”),
“orphanedSizeBytes” : 0
},
{
“shardName” : “cold_shard_0”,
“numOrphanedDocs” : 0,
“numOwnedDocuments” : 53390,
“ownedSizeBytes” : NumberLong(“12012803390”),
“orphanedSizeBytes” : 0
},
{
“shardName” : “cold_shard_2”,
“numOrphanedDocs” : 0,
“numOwnedDocuments” : 53918,
“ownedSizeBytes” : NumberLong(“12055094276”),
“orphanedSizeBytes” : 0
},
{
“shardName” : “cold_shard_5”,
“numOrphanedDocs” : 30391,
“numOwnedDocuments” : 719018,
“ownedSizeBytes” : NumberLong(“168458008202”),
“orphanedSizeBytes” : NumberLong(“7120276999”)
}
]
}
mongos>

The collection seems that is still going under balancing.

Starting from MongoDB 6.0+ the balancer aim to have a collection equally distributed in term of data size over the shards (read more here)

The shard cold_shard_5 has more data than other shards (~ 145GB more) and orphans document that will get deleted.

If you periodically check the output of $shardedDataDistribution you should see the number of numOwnedDocuments for cold_shard_5 decrease with the time until all the shards have approximately the same amount of data.

Thanks a lot,

After your message I disabled my backend API that insert in my gridfs and it get balanced faster after that, I guess the migration is working better when the collections is not used.

Now everything is balanced correctly :