Performance problems due to queued 'moveChunk' commands

hi gurus,

(running MongoDB community version, downgraded from 4.4.x to 4.2.12, on a cluster of 108 shards running Ubuntu-18, plenty of RAM/CPU/disk resources)

We have been struggling with chunk movements from the beginning of our MongoDB experiences. We have too much data to be loaded to let the balancer moving chunks, cos they simply fill up too fast for the balancer being able to catch up. We need to load some 15-20 GB of data every day.

So we create empty collections, pre-split them in 3600 chunks, then start moving these chunks over our 108 shards. This is no fun when data-loading to other collections (NOT the ones whose chunks are being moved) should be able to continue at the same time.

What we experience is the following behaviour:

  1. we launch “mongo” client and run a loop of “moveChunk” commands, one for each chunk; we expect every command to complete only when the given chunk is absolutely and completely moved to the target shard; this does only happen for some 30-70% of the commands, depending on the load that the cluster is having to absorb new data at the same time to other collections

  2. while loading intensively data to other (older, already balanced) collections, we clearly see a yoyo of speeding chunk-movements while data-loading is stalling, and vice-versa

  3. this leads often to a situation where nearly all of the ‘moveChunk’ commands return as if they have completed the movement effectively, while this is not the case : we launch repeatedly a “sh.status()” in another client session so see the chunks moving between shards, but at such moment very few or none are moved anymore (the chunk numbers per shards do not change anymore)

  4. finally the data-loading programs get entirely blocked, no progress happens anymore for hours; so I stopped the ‘moveChunk’ actions hoping that this will release some dead-locks.
    But even after stopping launching extra ‘moveChunk’ command, the data-loading remains stalled, and the logs of the config servers continue to show during the next hours some zillions of aborted “moveChunk” commands, such as these logs :

    2021-02-12T23:38:46.814+0100 I  COMMAND  [conn13293] command config.changelog appName: "MongoDB Shell" command: aggregate { aggregate: "changelog", pipeline: [ { $match: { time: { $gt: new Date(1613083126450) }, what: "moveChunk.from", details.errmsg: { $exists: false }, details.note: "success" } }, { $group: { _id: { msg: "$details.errmsg" }, count: { $sum: 1.0 } } }, { $project: { _id: { $ifNull: [ "$_id.msg", "Success" ] }, count: "$count" } } ], fromMongos: true, cursor: { batchSize: 101 }, useNewUpsert: true, lsid: { id: UUID("1cf4a475-77d0-43de-9c55-dc666bf0c413"), uid: BinData(0, 62D27C0951893E9A231A50619557E99A407E4D68860B03D4BE41848E9E6DB72D) }, $clusterTime: { clusterTime: Timestamp(1613169526, 34), signature: { hash: BinData(0, F5C11CEBB02935427C0CBFA6764CEE57087F7FA1), keyId: 6926813995315757087 } }, $audit: { $impersonatedUsers: [ { user: "mongo-admin", db: "admin" } ], $impersonatedRoles: [ { role: "root", db: "admin" } ] }, $client: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.2.12" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "18.04" }, mongos: { host: "ndecar01:30004", client: "10.100.22.4:35860", version: "4.2.12" } }, $configServerState: { opTime: { ts: Timestamp(1613169526, 34), t: 2 } }, $db: "config" } planSummary: COLLSCAN keysExamined:0 docsExamined:292086 cursorExhausted:1 numYields:2281 nreturned:1 queryHash:FF2F2B68 planCacheKey:ABB36346 reslen:359 locks:{ ReplicationStateTransition: { acquireCount: { w: 2284 } }, Global: { acquireCount: { r: 2284 } }, Database: { acquireCount: { r: 2284 } }, Collection: { acquireCount: { r: 2284 } }, Mutex: { acquireCount: { r: 3 } } } storage:{} protocol:op_msg 362ms
    2021-02-12T23:38:46.815+0100 I  COMMAND  [conn109521] command config.changelog appName: "MongoDB Shell" command: aggregate { aggregate: "changelog", pipeline: [ { $match: { time: { $gt: new Date(1613083126450) }, what: "moveChunk.from", details.errmsg: { $exists: false }, details.note: "success" } }, { $group: { _id: { msg: "$details.errmsg" }, count: { $sum: 1.0 } } }, { $project: { _id: { $ifNull: [ "$_id.msg", "Success" ] }, count: "$count" } } ], cursor: {}, lsid: { id: UUID("1cf4a475-77d0-43de-9c55-dc666bf0c413") }, $clusterTime: { clusterTime: Timestamp(1613169526, 34), signature: { hash: BinData(0, F5C11CEBB02935427C0CBFA6764CEE57087F7FA1), keyId: 6926813995315757087 } }, $db: "config" } nShards:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:270 protocol:op_msg 363ms
    2021-02-12T23:38:47.221+0100 I  COMMAND  [conn13293] command config.changelog appName: "MongoDB Shell" command: aggregate { aggregate: "changelog", pipeline: [ { $match: { time: { $gt: new Date(1613083126450) }, what: "moveChunk.from", $or: [ { details.errmsg: { $exists: true } }, { details.note: { $ne: "success" } } ] } }, { $group: { _id: { msg: "$details.errmsg", from: "$details.from", to: "$details.to" }, count: { $sum: 1.0 } } }, { $project: { _id: { $ifNull: [ "$_id.msg", "aborted" ] }, from: "$_id.from", to: "$_id.to", count: "$count" } } ], fromMongos: true, cursor: { batchSize: 101 }, useNewUpsert: true, lsid: { id: UUID("1cf4a475-77d0-43de-9c55-dc666bf0c413"), uid: BinData(0, 62D27C0951893E9A231A50619557E99A407E4D68860B03D4BE41848E9E6DB72D) }, $clusterTime: { clusterTime: Timestamp(1613169526, 34), signature: { hash: BinData(0, F5C11CEBB02935427C0CBFA6764CEE57087F7FA1), keyId: 6926813995315757087 } }, $audit: { $impersonatedUsers: [ { user: "mongo-admin", db: "admin" } ], $impersonatedRoles: [ { role: "root", db: "admin" } ] }, $client: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.2.12" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "18.04" }, mongos: { host: "ndecar01:30004", client: "10.100.22.4:35860", version: "4.2.12" } }, $configServerState: { opTime: { ts: Timestamp(1613169526, 34), t: 2 } }, $db: "config" } planSummary: COLLSCAN keysExamined:0 docsExamined:292086 cursorExhausted:1 numYields:2281 nreturned:22 queryHash:5C8044A8 planCacheKey:67431B61 reslen:2003 locks:{ ReplicationStateTransition: { acquireCount: { w: 2283 } }, Global: { acquireCount: { r: 2283 } }, Database: { acquireCount: { r: 2283 } }, Collection: { acquireCount: { r: 2283 } }, Mutex: { acquireCount: { r: 2 } } } storage:{} protocol:op_msg 404ms
    2021-02-12T23:38:47.221+0100 I  COMMAND  [conn109521] command config.changelog appName: "MongoDB Shell" command: aggregate { aggregate: "changelog", pipeline: [ { $match: { time: { $gt: new Date(1613083126450) }, what: "moveChunk.from", $or: [ { details.errmsg: { $exists: true } }, { details.note: { $ne: "success" } } ] } }, { $group: { _id: { msg: "$details.errmsg", from: "$details.from", to: "$details.to" }, count: { $sum: 1.0 } } }, { $project: { _id: { $ifNull: [ "$_id.msg", "aborted" ] }, from: "$_id.from", to: "$_id.to", count: "$count" } } ], cursor: {}, lsid: { id: UUID("1cf4a475-77d0-43de-9c55-dc666bf0c413") }, $clusterTime: { clusterTime: Timestamp(1613169526, 34), signature: { hash: BinData(0, F5C11CEBB02935427C0CBFA6764CEE57087F7FA1), keyId: 6926813995315757087 } }, $db: "config" } nShards:1 cursorExhausted:1 numYields:0 nreturned:22 reslen:1914 protocol:op_msg 405ms
    

This makes me think that the ‘moveChunk’ commands are not terminating when the movement is indeed done, failed or aborted; instead they seem to be queued while competing for some locks with the data-loading actions. Ultimately dead-locking all activities to the cluster.

So we can now only wait for several extra hours until this seemingly queued set of ‘moveChunk’ actions have all aborted, before we can start loading data again. That’s obviously not acceptable for a production cluster.

=> does anyone have a clue or hints what is going on behind the scenes, what we can do or shouldn’t do to avoid such deadlocks to happen ? Even the slightest suggestion is highly appreciaed ! :slight_smile:

thx in advance !
Rob