BLOGAnnounced at MongoDB.local NYC 2024: A recap of all announcements and updates — Learn more >
MONGODB SYSTEM ALERT:

Sharded multi-document transactions may perform operations using inconsistent sharding metadata

We have identified issues that can cause sharded multi-document transactions to return incorrect data and possibly miss writes. These issues may manifest when using any of the following:

  • Queryable Encryption
  • Sharded multi-document transactions

while concurrently executing any of the following operations:

  • moveChunk, moveRange, movePrimary, renameCollection, drop, and reshardCollection
    • The moveChunk / moveRange commands are called automatically by the sharded cluster balancer.
    • The movePrimary command may be called automatically by Atlas or Ops Manager automation when removing a shard.

MongoDB 4.4.29, 5.0.25, 6.0.14, 7.0.8, and Rapid Release 7.3.1 contain fixes for these issues.

Issue Descriptions and Impact

A sharded multi-document transaction may miss reads and writes if the data it is referencing is affected by a concurrently executing command which modifies sharding metadata. Multi-document transactions are implicitly leveraged by Queryable Encryption, making this feature susceptible to being impacted in sharded environments.

The necessary preconditions for one or more of these issues to manifest are the following (all must be met):

  • A sharded cluster with 2 or more shards;
  • The use either of:
  • The usage of operations which modify sharding metadata (see table below for specific commands); and
  • A specific ordering of operations concurrent with the command(s) above.

As a result of these issues:

  • Read operations inside a multi-document transaction will complete successfully but may return incomplete results if the documents being queried had their sharding metadata modified during the course of the transaction.
  • Write operations inside a multi-document transaction will complete successfully but may not update all documents that should have been updated, if these writes occurred on a document that had its sharding metadata modified during the course of the transaction.

Issues

See below for more information on the specific commands that can modify sharding metadata:

IssueAffected VersionsCommands
Read Concerns
Requires Transactions on multiple collections
Issue
SERVER-77506
SERVER-84723, SERVER-87061
SERVER-82353
Affected Versions
  • 4.4.0 - 4.4.27
  • 5.0.0 - 5.0.23
  • 6.0.0 - 6.0.12
  • 7.0.0 - 7.0.2
  • 7.1.0 - 7.1.1
  • 7.0.0 - 7.0.6
  • 7.1.0 - 7.2.1
  • 4.4.0 - 4.4.28
  • 5.0.0 - 5.0.24
  • 6.0.0 - 6.0.13
  • 7.0.0 - 7.0.5
  • 7.1.0 - 7.2.0
Commands

moveChunk,
moveRange

Note: Issued automatically unless the balancer has been disabled.

drop,
renameCollection, reshardCollection
movePrimary
Read Concerns
  • local
  • majority
  • local
  • majority
  • snapshot
  • local
  • majority
  • snapshot
Requires Transactions on multiple collections
Yes
Yes
No

MongoDB 4.4.29, 5.0.25, 6.0.14, 7.0.8, and Rapid Release 7.3.1 contain fixes for these issues.

Mitigation

If your workload utilizes multi-document transactions on a Sharded cluster meeting the criteria above, we recommend that you upgrade to MongoDB version 4.4.29, 5.0.25, 6.0.14, 7.0.8, or rapid release 7.3.1 (or later).

If you are on MongoDB Atlas, your cluster has already been upgraded.

If you are not able to upgrade immediately:

  1. Avoid issuing commands that modify sharding metadata concurrently with multi-document transactions. Specifically, avoid using the commands:

    • drop
    • renameCollection
    • reshardCollection
    • movePrimary
    • moveChunk/moveRange

  2. Disable the sharded cluster balancer via using the command sh.stopBalancer() command to disable the balancer.

  3. As soon as possible, upgrade to MongoDB version 4.4.29, 5.0.25, 6.0.14, 7.0.8, or rapid release 7.3.1 (or later).

  4. Re-enable the balancer using sh.startBalancer().

  5. See the remediation section below.

Detailed impacts & diagnosis / remediation

The sections below offer guidance for specific feature usage. All implicated commands record entries in the changelog collection in the config database. The changelog collection can be inspected to determine when these commands may have been issued in the past. This collection is capped at 200mb by default; users may consult backups for further history. Please review the sections relevant to your environment and examine the changelog collection as needed.

Queryable Encryption

Queryable Encryption can be impacted by SERVER-84723, SERVER-87061 and SERVER-82353.

An environment that leverages Queryable Encryption may be impacted if ALL of the following are met:

  • A sharded cluster:
    • Containing 2 or more shards, and
    • Running MongoDB 7.0.0 through 7.0.5;
  • An environment where:
    • Fields are encrypted using Queryable Encryption, and
    • An operation was issued which modified sharding metadata for one of these collections, or their side collections.
      • Specifically:
        • drop(),
        • renameCollection(),
        • reshardCollection(), or
        • movePrimary()
      • The changelog collection in the config database can be inspected to determine when these commands may have been issued in the past. This collection is capped at 200mb by default; users may consult backups for further history.

If you are a user of Queryable Encryption and meet the criteria above, please reach out to MongoDB Support for further assistance.

Sharded Multi-document Transaction Usage

A client explicitly leveraging sharded multi-document transactions can be impacted by any of SERVER-77506, SERVER-84723, SERVER-87061 or SERVER-82353.

An environment with a workload that explicitly leverages multi-document transactions may be impacted only if ALL of the following are met:

  • A sharded cluster:
    • Containing 2 or more shards, and
    • Running versions:
      • 4.4.0 - 4.4.28,
      • 5.0.0 - 5.0.24,
      • 6.0.0 - 6.0.13, or
      • 7.0.0 - 7.0.6
  • A workload using multi-document transactions, where:
    • Transactions, using a local (default for reads) or majority read concern, accessed a sharded collection concurrently with a moveChunk() or moveRange() command (this includes balancing activity).
      • Transactions using a snapshot read concern are not affected.
      • See SERVER-77506 for more information.
    • Transactions, using any read concern, accessed a sharded or unsharded collection concurrently with a drop(), renameCollection() or reshardCollection().
    • Transactions, using any read concern, accessed an unsharded collection concurrently with a movePrimary() command.

If your environment and workload meet these criteria, we recommend that you:

  1. Review your application with a focus on how missed reads or writes in a transaction could impact your application. Consult the specific tickets linked in the appendix to better understand how each of these issues can manifest.

  2. Determine when any of the metadata modifying operations referenced above were issued on collections referenced by these multi-document transactions. Consult the appendix below for additional detail on how each of the issues above may manifest.

    • The changelog collection in the config database can be inspected to determine when the commands referenced above may have been issued in the past. This collection is capped at 200mb by default; users may consult backups for further history.
  3. Consult your application logs to assess which documents may have been affected.

  4. Review the contents of these documents to assess if they are logically consistent from the perspective of your application. You may also want to consider how other collections / documents may have been impacted if reads or writes may not have been applied correctly.

Appendix

For more information about the issues summarized here, please see the individual tickets below:

  • SERVER-77506 - multi-document transaction with concurrent moveChunk
  • SERVER-84723 / SERVER-87061 - multi-document transaction with concurrent drop, renameCollection, or reshardCollection
  • SERVER-82353 - multi-document transaction with concurrent movePrimary