Docs Menu

Docs HomeDevelop ApplicationsMongoDB Manual

Production Considerations

On this page

  • Availability
  • Feature Compatibility
  • Runtime Limit
  • Oplog Size Limit
  • WiredTiger Cache
  • Transactions and Security
  • Shard Configuration Restriction
  • Sharded Clusters and Arbiters
  • Acquiring Locks
  • Pending DDL Operations and Transactions
  • In-progress Transactions and Write Conflicts
  • In-progress Transactions and Stale Reads
  • In-progress Transactions and Chunk Migration
  • Outside Reads During Commit
  • Errors
  • Additional Information

The following page lists some production considerations for running transactions. These apply whether you run transactions on replica sets or sharded clusters. For running transactions on sharded clusters, see also the Production Considerations (Sharded Clusters) for additional considerations that are specific to sharded clusters.

  • In version 4.0, MongoDB supports multi-document transactions on replica sets.

  • In version 4.2, MongoDB introduces distributed transactions, which adds support for multi-document transactions on sharded clusters and incorporates the existing support for multi-document transactions on replica sets.

    To use transactions on MongoDB 4.2 deployments (replica sets and sharded clusters), clients must use MongoDB drivers updated for MongoDB 4.2.

Note

Distributed Transactions and Multi-Document Transactions

Starting in MongoDB 4.2, the two terms are synonymous. Distributed transactions refer to multi-document transactions on sharded clusters and replica sets. Multi-document transactions (whether on sharded clusters or replica sets) are also known as distributed transactions starting in MongoDB 4.2.

To use transactions, the featureCompatibilityVersion for all members of the deployment must be at least:

Deployment
Minimum featureCompatibilityVersion
Replica Set
4.0
Sharded Cluster
4.2

To check the fCV for a member, connect to the member and run the following command:

db.adminCommand( { getParameter: 1, featureCompatibilityVersion: 1 } )

For more information, see the setFeatureCompatibilityVersion reference page.

By default, a transaction must have a runtime of less than one minute. You can modify this limit using transactionLifetimeLimitSeconds for the mongod instances. For sharded clusters, the parameter must be modified for all shard replica set members. Transactions that exceeds this limit are considered expired and will be aborted by a periodic cleanup process.

For sharded clusters, you can also specify a maxTimeMS limit on commitTransaction. For more information, see Sharded Clusters Transactions Time Limit.

Starting in version 4.2,
MongoDB creates as many oplog entries as necessary to the encapsulate all write operations in a transaction, instead of a single entry for all write operations in the transaction. This removes the 16MB total size limit for a transaction imposed by the single oplog entry for all its write operations. Although the total size limit is removed, each oplog entry still must be within the BSON document size limit of 16MB.
In version 4.0,
MongoDB creates a single oplog (operations log) entry at the time of commit if the transaction contains any write operations. That is, the individual operations in the transactions do not have a corresponding oplog entry. Instead, a single oplog entry contains all of the write operations within a transaction. The oplog entry for the transaction must be within the BSON document size limit of 16MB.

To prevent storage cache pressure from negatively impacting the performance:

  • When you abandon a transaction, abort the transaction.

  • When you encounter an error during individual operation in the transaction, abort and retry the transaction.

The transactionLifetimeLimitSeconds also ensures that expired transactions are aborted periodically to relieve storage cache pressure.

Note

If you have an uncommitted transaction that causes excessive pressure on the WiredTiger cache, the transaction aborts and returns a write conflict error.

If a transaction is too large to ever fit in the WiredTiger cache, the transaction aborts and returns a TransactionTooLargeForCache error.

You cannot run transactions on a sharded cluster that has a shard with writeConcernMajorityJournalDefault set to false (such as a shard with a voting member that uses the in-memory storage engine).

Transactions whose write operations span multiple shards will error and abort if any transaction operation reads from or writes to a shard that contains an arbiter.

By default, transactions wait up to 5 milliseconds to acquire locks required by the operations in the transaction. If the transaction cannot acquire its required locks within the 5 milliseconds, the transaction aborts.

Transactions release all locks upon abort or commit.

Tip

When creating or dropping a collection immediately before starting a transaction, if the collection is accessed within the transaction, issue the create or drop operation with write concern "majority" to ensure that the transaction can acquire the required locks.

You can use the maxTransactionLockRequestTimeoutMillis parameter to adjust how long transactions wait to acquire locks. Increasing maxTransactionLockRequestTimeoutMillis allows operations in the transactions to wait the specified time to acquire the required locks. This can help obviate transaction aborts on momentary concurrent lock acquisitions, like fast-running metadata operations. However, this could possibly delay the abort of deadlocked transaction operations.

You can also use operation-specific timeout by setting maxTransactionLockRequestTimeoutMillis to -1.

If a multi-document transaction is in progress, new DDL operations that affect the same database(s) or collection(s) wait behind the transaction. While these pending DDL operations exist, new transactions that access the same database(s) or collection(s) as the pending DDL operations cannot obtain the required locks and and will abort after waiting maxTransactionLockRequestTimeoutMillis. In addition, new non-transaction operations that access the same database(s) or collection(s) will block until they reach their maxTimeMS limit.

Consider the following scenarios:

DDL Operation That Requires a Collection Lock

While an in-progress transaction is performing various CRUD operations on the employees collection in the hr database, an administrator issues the db.collection.createIndex() DDL operation against the employees collection. createIndex() requires an exclusive collection lock on the collection.

Until the in-progress transaction completes, the createIndex() operation must wait to obtain the lock. Any new transaction that affects the employees collection and starts while the createIndex() is pending must wait until after createIndex() completes.

The pending createIndex() DDL operation does not affect transactions on other collections in the hr database. For example, a new transaction on the contractors collection in the hr database can start and complete as normal.

DDL Operation That Requires a Database Lock

While an in-progress transaction is performing various CRUD operations on the employees collection in the hr database, an administrator issues the collMod DDL operation against the contractors collection in the same database. collMod requires a database lock on the parent hr database.

Until the in-progress transaction completes, the collMod operation must wait to obtain the lock. Any new transaction that affects the hr database or any of its collections and starts while the collMod is pending must wait until after collMod completes.

In either scenario, if the DDL operation remains pending for more than maxTransactionLockRequestTimeoutMillis, pending transactions waiting behind that operation abort. That is, the value of maxTransactionLockRequestTimeoutMillis must at least cover the time required for the in-progress transaction and the pending DDL operation to complete.

Tip

See also:

If a transaction is in progress and a write outside the transaction modifies a document that an operation in the transaction later tries to modify, the transaction aborts because of a write conflict.

If a transaction is in progress and has taken a lock to modify a document, when a write outside the transaction tries to modify the same document, the write waits until the transaction ends.

Read operations inside a transaction can return old data, which is known as a stale read. Read operations inside a transaction are not guaranteed to see writes performed by other committed transactions or non-transactional writes. For example, consider the following sequence:

  1. A transaction is in-progress.

  2. A write outside the transaction deletes a document.

  3. A read operation inside the transaction can read the now-deleted document since the operation uses a snapshot from before the write operation.

To avoid stale reads inside transactions for a single document, you can use the db.collection.findOneAndUpdate() method. The following mongosh example demonstrates how you can use db.collection.findOneAndUpdate() to take a write lock and ensure that your reads are up to date:

1
db.getSiblingDB("hr").employees.insertOne(
{ _id: 1, status: "Active" }
)
2
session = db.getMongo().startSession( { readPreference: { mode: "primary" } } )
3
session.startTransaction( { readConcern: { level: "snapshot" }, writeConcern: { w: "majority" } } )
employeesCollection = session.getDatabase("hr").employees
4
employeeDoc = employeesCollection.findOneAndUpdate(
{ _id: 1, status: "Active" },
{ $set: { lockId: ObjectId() } },
{ returnNewDocument: true }
)

Note that inside the transaction, the findOneAndUpdate operation sets a new lockId field. You can set lockId field to any value, as long as it modifies the document. By updating the document, the transaction acquires a lock.

If an operation outside of the transaction attempts to modify the document before you commit the transaction, MongoDB returns a write conflict error to the external operation.

5
session.commitTransaction()

After you commit the transaction, MongoDB releases the lock.

Note

If any operation in the transaction fails, the transaction aborts and all data changes made in the transaction are discarded without ever becoming visible in the collection.

Chunk migration acquires exclusive collection locks during certain stages.

If an ongoing transaction has a lock on a collection and a chunk migration that involves that collection starts, these migration stages must wait for the transaction to release the locks on the collection, thereby impacting the performance of chunk migrations.

If a chunk migration interleaves with a transaction (for instance, if a transaction starts while a chunk migration is already in progress and the migration completes before the transaction takes a lock on the collection), the transaction errors during the commit and aborts.

Depending on how the two operations interleave, some sample errors include (the error messages have been abbreviated):

  • an error from cluster data placement change ... migration commit in progress for <namespace>

  • Cannot find shardId the chunk belonged to at cluster time ...

During the commit for a transaction, outside read operations may try to read the same documents that will be modified by the transaction. If the transaction writes to multiple shards, then during the commit attempt across the shards:

  • Outside reads that use read concern "snapshot" or "linearizable" wait until all writes of a transaction are visible.

  • Outside reads that are part of causally consistent sessions (those that include afterClusterTime) wait until all writes of a transaction are visible.

  • Outside reads using other read concerns do not wait until all writes of a transaction are visible, but instead read the before-transaction version of the documents.

To use transactions on MongoDB 4.2 deployments (replica sets and sharded clusters), clients must use MongoDB drivers updated for MongoDB 4.2.

On sharded clusters with multiple mongos instances, performing transactions with drivers updated for MongoDB 4.0 (instead of MongoDB 4.2) will fail and can result in errors, including:

Note

Your driver may return a different error. Refer to your driver's documentation for details.

Error Code
Error Message
251
cannot continue txnId -1 for session ... with txnId 1
50940
cannot commit with no participants
←  Drivers APIProduction Considerations (Sharded Clusters) →