refineCollectionShardKey, not so refined anymore (very unbalanced chunks)

Antoine_L · June 6, 2025, 3:30pm

0

As a start:

Shard key is compounded to: group > user > document_name, was only for group
The cluster is composed of 2 shards
There is about a billion entries
Mongo 6.0 is being used
The database itself seems to work fine

After initiating the admin command for refineCollectionShardKey, the balancer kept only increasing one shard which resulted in the following status:

'work.FILE': {
    shardKey: { 'user.group': 1, 'user.name': 1, document_name: 1 },
    unique: false,
    balancing: true,
    chunkMetadata: [
      { shard: 'RepSet1', nChunks: 26 },
      { shard: 'RepSet2', nChunks: 4248 }
    ],

With this:

{
  'Currently enabled': 'yes',
  'Currently running': 'no',
  'Failed balancer rounds in last 5 attempts': 0,
  'Migration Results for the last 24 hours': {
    '13': "Failed with error 'aborted', from RepSet1 to RepSet2",
    '4211': 'Success'
  }
}

Also:

Shard RepSet1
{
  data: '1179.56GiB',
  docs: Long("2369155472"),
  chunks: 26,
  'estimated data per chunk': '45.36GiB',
  'estimated docs per chunk': 91121364
}
---
Shard RepSet2
{
  data: '1179.78GiB',
  docs: 2063496305,
  chunks: 4248,
  'estimated data per chunk': '284.39MiB',
  'estimated docs per chunk': 485757
}
---
Totals
{
  data: '2359.35GiB',
  docs: 4432651777,
  chunks: 4274,
  'Shard RepSet1': [
    '49.99 % data',
    '53.44 % docs in cluster',
    '534B avg obj size on shard'
  ],
  'Shard RepSet2': [
    '50 % data',
    '46.55 % docs in cluster',
    '613B avg obj size on shard'
  ]
}

Also my chunks seems to be weirdly distributed:

...
{"user.group":"cmdd_public_5years","user.name":"yvc001","archivename":"cmdw_yvc001_prognos_obs_v3_20230110"} -->> {"user.group":"cmdd_public_5years","user.name":"yvc001","archivename":"cmdw_yvc001_prognos_predictors_stns_rdps_db_00Z"} on: hpcRepSet1 {"$timestamp":"8130373091856"} jumbo
                        {"user.group":"crd_cccma","user.name":"cpd101","archivename":"sc_rc3.1-rcp2609hv_205401_205412_20201200713"} -->> {"user.group":"crd_ccmr","user.name":{},"archivename":{}} on: hpcRepSet1 {"$timestamp":"18232136171521"}
                        {"user.group":"crd_ccmr","user.name":{},"archivename":{}} -->> {"user.group":"crd_cdas","user.name":{},"archivename":{}} on: hpcRepSet1 {"$timestamp":"111669149699"}
                        {"user.group":"crd_short_term","user.name":{},"archivename":{}} -->> {"user.group":"dfo_dpnm_perm","user.name":{},"archivename":{}} on: hpcRepSet1 {"$timestamp":"90194313221"}
                        {"user.group":"dfo_dpnm_perm","user.name":{},"archivename":{}} -->> {"user.group":"di_backups","user.name":{},"archivename":{}} on: hpcRepSet1 {"$timestamp":"90194313222"}
                        {"user.group":"di_backups","user.name":{},"archivename":{}} -->> {"user.group":"di_datamart","user.name":{},"archivename":{}} on: hpcRepSet1 {"$timestamp":"90194313223"}
                        {"user.group":"di_datamart","user.name":{},"archivename":{}} -->> {"user.group":"di_logs","user.name":{},"archivename":{}} on: hpcRepSet1 {"$timestamp":"158913789954"}
                        {"user.group":"di_logs","user.name":{},"archivename":{}} -->> {"user.group":"hpci_priv","user.name":{},"archivename":{}} on: hpcRepSet1 {"$timestamp":"163208757250"}
                        {"user.group":"hpci_pub","user.name":{},"archivename":{}} -->> {"user.group":"hpcs_backups","user.name":{},"archivename":{}} on: hpcRepSet1 {"$timestamp":"180388626434"}
                        {"user.group":"hpcs_logs","user.name":{},"archivename":{}} -->> {"user.group":"mrd_public_5years","user.name":{},"archivename":{}} on: hpcRepSet1 {"$timestamp":"184683593730"}
                        {"user.group":"reqa_reanalysis_permanent","user.name":{},"archivename":{}} -->> {"user.group":"reqa_scenarios_5years","user.name":{},"archivename":{}} on: hpcRepSet1 {"$timestamp":"188978561026"}
                        {"user.group":"reqa_scenarios_5years","user.name":{},"archivename":{}} -->> {"user.group":"reqa_scenarios_permanent","user.name":{},"archivename":{}} on: hpcRepSet1 {"$timestamp":"98784247810"}
                        {"user.group":"reqa_vaqum_5years","user.name":{},"archivename":{}} -->> {"user.group":"rpnatm_2ans","user.name":{},"archivename":{}} on: hpcRepSet1 {"$timestamp":"98784247812"}
                        {"user.group":"rpndat_5ans","user.name":{},"archivename":{}} -->> {"user.group":"rpnenv_2ans","user.name":{},"archivename":{}} on: hpcRepSet1 {"$timestamp":"171798691842"}
                        {"user.group":"rpnenv_permanent","user.name":{},"archivename":{}} -->> {"user.group":"ssc_hpci","user.name":{},"archivename":{}} on: hpcRepSet1 {"$timestamp":"167503724546"}
                        {"user.group":"stats","user.name":{},"archivename":{}} -->> {"user.group":"syslogs","user.name":{},"archivename":{}} on: hpcRepSet1 {"$timestamp":"176093659138"}
                        {"user.group":"syslogs","user.name":{},"archivename":{}} -->> {"user.group":"wherd_permanent","user.name":{},"archivename":{}} on: hpcRepSet1 {"$timestamp":"111669149700"}
                        {"_id":{"id":"80000000-0000-0000-0000-000000000000"}} -->> {"_id":{"id":"80400000-0000-0000-0000-000000000000"}} on: hpcRepSet1 {"$timestamp":"2203318222849"}
                        {"_id":{"id":"80400000-0000-0000-0000-000000000000"}} -->> {"_id":{"id":"80800000-0000-0000-0000-000000000000"}} on: hpcRepSet1 {"$timestamp":"4294967810"}
...

Like these were using the shard key but then switched somehow to _id (which could cause a monotonic increase.)

All documents look like this and they should all have the same format (no missing key:value):

{ 
  "_id" : ObjectId("5a68e11aedd1713655361323"), 
  "size" : 524288000, 
  "file" : {
    "archive_time" : "2018-01-24T19:40:10.717646",
    "checksum" : null,
    "creation_time" : "2017-12-22T14:44:12.643444",
    "part" : ".part000000",
    "tags" : null,
    "size" : 524288000,
    "group" : "root",
    "filename" : "data/docker_compose/prod_test/data_set/file500",
    "owner" : "root", 
    "inode" : 1310730 
  }, 
  "document_name" : "file500",
  "folder" : "priv/s002/_1",
  "user" : {
    "group" : "priv",
    "rgrp" : "global_n",
    "name" : "s002"
  }
}

Looking into the logs 12 out of these 13 failed errors where from moveChunk:

"msg":"Error while doing moveChunk","attr":{"error":"ChunkTooBig: Cannot move chunk: the maximum number of documents for a chunk is 497102, the maximum chunk size is 134217728, average document size is 540. Found 501324 documents in chunk

The highest chunk document count is about 600k which is not so far from the threshold.

The other error was:

"msg":"Error while doing moveChunk","attr":{"error":"Interrupted: Failed to contact recipient shard to monitor data transfer :: caused by :: operation was interrupted"}}

I have a plenty of storage space, enough cores too. Memory might be low:

My questions:

Is there an issue? I believe the balancing is wrong considering the big difference between nChunks. If so why? Obviously this seems to be the major symptom: chunks: 26, 'estimated data per chunk': '45.36GiB', 'estimated docs per chunk': 91121364
If the balancer blocked because of the chunks being too big, what option do I have?
How to confirm that the sharding has been properly done?
Suggestions?

About monotonality, the only thing that could cause this would the config replica set:

database: { _id: 'config', primary: 'config', partitioned: true },
collections: {
  'config.system.sessions': {
    shardKey: { _id: 1 },
    unique: false,
    balancing: true,
    chunkMetadata: [
      { shard: 'RepSet1', nChunks: 512 },
      { shard: 'RepSet2', nChunks: 512 }
    ],

Wernfried_Domscheit · June 7, 2025, 11:57am

Apart from the fact, that your data is perfectly distributed, i.e. you have two shards and each of them stores 50% of your data, the shown chunks do not match.

You say, the shard key is { 'user.group': 1, 'user.name': 1, document_name: 1 } but in the distribution stats I see {"user.group", "user.name" ,"archivename"} which is different. I don’t know your data, however I guess the number of different group and user values is rather low.

Antoine_L · June 9, 2025, 2:33pm

yeah, sorry, I wanted to edit but since it’s not possible I could not.

Note that the shard key is the same, I just changed the name, same goes for the shard names.

Thanks