Collection replicated entirely instead of being distributed

Daniel_Jackson3 · September 20, 2023, 2:35pm

I have made an experimental setup with two sharding nodes by following the “Deploy a Sharded Cluster” tutorial.

I intend to distribute accross the nodes a large GridFS chunks collection of over 700 GB.

At first, I tried working with the collection already populated, but it appeared that MongoDB would not split it up into chunks. Therefore I exported the collection and imported it again to mongos, this time it turned into 7271 chunks.

While this looked great I looked at the disk statistics and figured out that the full collection had been replicated entirely accrross the two nodes instead of being load balanced.

From there, I am stuck.

> db.fs.chunks.getShardDistribution()
Shard shtest at shtest/192.168.82.10:27019,192.168.82.20:27019
{
  data: '716.95GiB',
  docs: 4038821,
  chunks: 7271,
  'estimated data per chunk': '100.97MiB',
  'estimated docs per chunk': 555
}
---
Totals
{
  data: '716.95GiB',
  docs: 4038821,
  chunks: 7271,
  'Shard shtest': [
    '100 % data',
    '100 % docs in cluster',
    '186KiB avg obj size on shard'
  ]
}

shards                                                                                         
[
  {
    _id: 'shtest',               
    host: 'shtest/192.168.82.10:27019,192.168.82.20:27019',
    state: 1,                                   
    topologyTime: Timestamp({ t: 1695174662, i: 4 })
  }                                             
] 

    collections: {
      'dbtest.fs.chunks': {
        shardKey: { files_id: 1, n: 1 },
        unique: false,
        balancing: true,
        chunkMetadata: [ { shard: 'shtest', nChunks: 7271 } ],
        chunks: [
          'too many chunks to print, use verbose if you want to force print'
        ],
        tags: []
      }
    }

I am running MongoDB 6.

Daniel_Jackson3 · September 20, 2023, 6:15pm

I have resolved my issue.

As I have limited knowledge of MongoDB I was a little confused reading documents saying that “shard” must be configured as “replicate” and actually got my two nodes as members of the same shard.

I have configured each node with a different shard name and load balancing is now operating as it should.

> sh.status()
          { shard: 'shtest', nChunks: 7231 },
          { shard: 'shtest1', nChunks: 40 }