I have made an experimental setup with two sharding nodes by following the “Deploy a Sharded Cluster” tutorial.
I intend to distribute accross the nodes a large GridFS chunks collection of over 700 GB.
At first, I tried working with the collection already populated, but it appeared that MongoDB would not split it up into chunks. Therefore I exported the collection and imported it again to mongos, this time it turned into 7271 chunks.
While this looked great I looked at the disk statistics and figured out that the full collection had been replicated entirely accrross the two nodes instead of being load balanced.
From there, I am stuck.
> db.fs.chunks.getShardDistribution()
Shard shtest at shtest/192.168.82.10:27019,192.168.82.20:27019
{
data: '716.95GiB',
docs: 4038821,
chunks: 7271,
'estimated data per chunk': '100.97MiB',
'estimated docs per chunk': 555
}
---
Totals
{
data: '716.95GiB',
docs: 4038821,
chunks: 7271,
'Shard shtest': [
'100 % data',
'100 % docs in cluster',
'186KiB avg obj size on shard'
]
}
shards
[
{
_id: 'shtest',
host: 'shtest/192.168.82.10:27019,192.168.82.20:27019',
state: 1,
topologyTime: Timestamp({ t: 1695174662, i: 4 })
}
]
collections: {
'dbtest.fs.chunks': {
shardKey: { files_id: 1, n: 1 },
unique: false,
balancing: true,
chunkMetadata: [ { shard: 'shtest', nChunks: 7271 } ],
chunks: [
'too many chunks to print, use verbose if you want to force print'
],
tags: []
}
}
I am running MongoDB 6.