Sharding cluster error

jiang_xu1 · December 6, 2021, 9:13am

An error appears in the mongod system log:

{“t”:{"$date":“2021-12-06T07:30:26.676+00:00”},“s”:“W”, “c”:“SHARDING”, “id”:23777, “ctx”:“MoveChunk”,“msg”:“Error while doing moveChunk”,“attr”:{“error”:“OperationFailed: Data transfer error: ExceededTimeLimit: Failed to delete orphaned .colletion range [{ field: “value1” }, { field: “value2” }) :: caused by :: operation exceeded time limit”}}

I don’t know what caused this problem.An uncertain guess is that there are too many documents corresponding to the same shardkey.
The number corresponding to value1 is 150,000, but value2 is only 6.

Can someone provide some explanation or guidance, thanks

MaBeuLux88_xxx · December 7, 2021, 9:32am

Hi @jiang_xu1 and welcome in the MongoDB community !

Do you have a jumbo chunk issue on this collection? What’s the shard key? How many chunks do you have in this collection and how big is it in total?

Cheers,
Maxime.

jiang_xu1 · December 8, 2021, 2:32am

Hello, thanks for reply.
I built a cluster with two shards (shard-A,shard-B) , use the non-unique shard key “field1” to shard “collectionA” and observed the above error:

“Failed to delete orphaned databaseA.collectionA range [{ field1: “value1” }, { field1: “value2” })”

CollectionA status:

chunkMetadata: [{ shard: ‘shard-A’, nChunks: 129 },{ shard: ‘shard-B’, nChunks: 129 }],
count: 16891443,
size: 18.04GB,
storageSize: 8.74GB,
totalIndexSize: 3.77GB,
totalSize: 12.51GB

mongo version: 5.0.4

jiang_xu1 · December 8, 2021, 2:42am

My sharding command: sh.shardCollection(“database.collectionA”, {“field1”: 1});

In fact, the value represented by field1 is an address in the blockchain. Considering that it is already a hash value, I used a range shard; later I realized that an active address would generate a lot of records and an inactive address would have fewer records.

The addresses mentioned in the mongod error log are not all addresses with many records

MaBeuLux88_xxx · December 8, 2021, 3:02am

Hi @jiang_xu1,

An ideal chunk size is smaller than 64 MB.

Looks like you have 129 * 2 = 258 chunks so the maximum data in this collection should be 258 * 64 MB = 16 512 MB. Usually chunks are actually way smaller than 64 MB as they are split into smaller chunks regularly. So I suspect that your shard key isn’t a good one & you have jumbo chunk issues: the balancer cannot split the chunks into smaller chunks anymore because you have too many documents with the same shard key and they are all in the same big chunk.

A sh.status(true) should confirm that you have jumbo chunks in your cluster.

2 ways to fix this:

Change the shard key (v5.0+)
Refine the shard key (v4.4+)

An ideal shard key:

has a BIG (infinite is good!) cardinality
is evenly distributed
doesn’t grow monotonically
is used in most of your queries

See doc: Choose a shard key.

While I’m at it: this collection is relatively small and shouldn’t require to be sharded. Usually we recommend to shard when the cluster contains more than 2 TB of data and their is no need to shard all the collections on the cluster.
Small collections like this one can live on one of the shard (== collection Primary shard) and shouldn’t need the support of a sharded cluster.

Cheers,
Maxime.

jiang_xu1 · December 8, 2021, 6:27am

It is true that the shard key I selected is not reasonable, but I did not find the jumbo flag in the result of the sh.status(true) command

jiang_xu1 · December 8, 2021, 6:49am

The reason I use sharding is that I am storing all the blockchain data, and the current data is only part of it