- Core MongoDB Operations (CRUD) >
- Write Operations
Write Operations¶
On this page
All operations that create or modify data in the MongoDB instance are write operations. MongoDB represents data as BSON documents stored in collections. Write operations target one collection and are atomic on the level of a single document: no single write operation can atomically affect more than one document or more than one collection.
This document introduces the write operators available in MongoDB as well as presents strategies to increase the efficiency of writes in applications.
Write Operators¶
For information on write operators and how to write data to a MongoDB database, see the following pages:
For information on specific methods used to perform write operations in the
mongo
shell, see the following:
db.collection.insert()
db.collection.update()
db.collection.save()
db.collection.findAndModify()
db.collection.remove()
For information on how to perform write operations from within an application, see the MongoDB Drivers and Client Libraries documentation or the documentation for your client library.
Write Concern¶
Note
The driver write concern
change created a new connection class in all of the MongoDB
drivers, called MongoClient
with a different default write
concern. See the release notes
for this change, and the release notes for the driver you’re using
for more information about your driver’s release.
Operational Considerations and Write Concern¶
Clients issue write operations with some level of write concern, which describes the level of concern or guarantee the server will provide in its response to a write operation. Consider the following levels of conceptual write concern:
errors ignored: Write operations are not acknowledged by MongoDB, and may not succeed in the case of connection errors that the client is not yet aware of, or if the
mongod
produces an exception (e.g. a duplicate key exception for unique indexes.) While this operation is efficient because it does not require the database to respond to every write operation, it also incurs a significant risk with regards to the persistence and durability of the data.Warning
Do not use this option in normal operation.
unacknowledged: MongoDB does not acknowledge the receipt of write operation as with a write concern level of ignore; however, the driver will receive and handle network errors, as possible given system networking configuration.
Before the releases outlined in Default Write Concern Change, this was the default write concern.
- receipt acknowledged: The
mongod
will confirm the receipt of the write operation, allowing the client to catch network, duplicate key, and other exceptions. After the releases outlined in Default Write Concern Change, this is the default write concern. [1]
journaled: The
mongod
will confirm the write operation only after it has written the operation to the journal. This confirms that the write operation can survive amongod
shutdown and ensures that the write operation is durable.While receipt acknowledged without journaled provides the fundamental basis for write concern, there is a window between journal commits where the write operation is not fully durable. See
journalCommitInterval
for more information on this window. Require journaled as part of the write concern to provide this durability guarantee.
Replica sets present an additional layer of
consideration for write concern. Basic write concern levels affect the
write operation on only one mongod
instance. The w
argument to getLastError
provides a replica
acknowledged level of write concern. With replica acknowledged you
can guarantee that the write operation has propagated to the members
of a replica set. See the Write Concern for Replica Sets document for more information.
Note
Requiring journaled write concern in a replica set only requires a journal commit of the write operation to the primary of the set regardless of the level of replica acknowledged write concern.
[1] | The default write concern is to call
getLastError with no arguments. For replica sets, you
can define the default write concern settings in the
getLastErrorDefaults
If getLastErrorDefaults does
not define a default write concern setting,
getLastError defaults to basic receipt acknowledgment. |
Internal Operation of Write Concern¶
To provide write concern, drivers issue
the getLastError
command after a write operation and
receive a document with information about the last operation. This
document’s err
field contains either:
null
, which indicates the write operations have completed successfully, or- a description of the last error encountered.
The definition of a “successful write” depends on the arguments
specified to getLastError
, or in replica sets, the
configuration of
getLastErrorDefaults
.
When deciding the level of write
concern for your application, become familiar with the
Operational Considerations and Write Concern.
The getLastError
command has the following options to configure write
concern requirements:
j
or “journal” optionThis option confirms that the
mongod
instance has written the data to the on-disk journal and ensures data is not lost if themongod
instance shuts down unexpectedly. Set totrue
to enable, as shown in the following example:If you set
journal
to true, and themongod
does not have journaling enabled, as withnojournal
, thengetLastError
will provide basic receipt acknowledgment, and will include ajnote
field in its return document.w
optionThis option provides the ability to disable write concern entirely as well as specifies the write concern operations for replica sets. See Operational Considerations and Write Concern for an introduction to the fundamental concepts of write concern. By default, the
w
option is set to1
, which provides basic receipt acknowledgment on a singlemongod
instance or on the primary in a replica set.The
w
option takes the following values:-1
:Disables all acknowledgment of write operations, and suppresses all including network and socket errors.
0
:Disables basic acknowledgment of write operations, but returns information about socket excepts and networking errors to the application.
Note
If you disable basic write operation acknowledgment but require journal commit acknowledgment, the journal commit prevails, and the driver will require that
mongod
will acknowledge the replica set.1
:Provides acknowledgment of write operations on a standalone
mongod
or the primary in a replica set.A number greater than 1:
Guarantees that write operations have propagated successfully to the specified number of replica set members including the primary. If you set
w
to a number that is greater than the number of set members that hold data, MongoDB waits for the non-existent members to become available, which means MongoDB blocks indefinitely.majority
:Confirms that write operations have propagated to the majority of configured replica set: nodes must acknowledge the write operation before it succeeds. This ensures that write operation will never be subject to a rollback in the course of normal operation, and furthermore allows you to prevent hard coding assumptions about the size of your replica set into your application.
A tag set:
By specifying a tag set you can have fine-grained control over which replica set members must acknowledge a write operation to satisfy the required level of write concern.
getLastError
also supports a wtimeout
setting which
allows clients to specify a timeout for the write concern: if you
don’t specify wtimeout
and the mongod
cannot fulfill
the write concern the getLastError
will block,
potentially forever.
For more information on write concern and replica sets, see Write Concern for Replica Sets for more information..
In sharded clusters, mongos
instances will pass write
concern on to the shard mongod
instances.
Bulk Inserts¶
In some situations you may need to insert or ingest a large amount of data into a MongoDB database. These bulk inserts have some special considerations that are different from other write operations.
The insert()
method, when passed an
array of documents, will perform a bulk insert, and inserts each
document atomically. Drivers
provide their own interface for this kind of operation.
Bulk insert can significantly increase performance by amortizing write concern costs. In the drivers, you can configure write concern for batches rather than on a per-document level.
Drivers also have a ContinueOnError
option in their insert
operation, so that the bulk operation will continue to insert
remaining documents in a batch even if an insert fails.
Note
If the bulk insert process generates more than one error in a batch
job, the client will only receive the most recent error. All bulk
operations to a sharded collection run with
ContinueOnError
, which applications cannot disable. See
Strategies for Bulk Inserts in Sharded Clusters section for more information on
consideration for bulk inserts in sharded clusters.
For more information see your driver documentation for details on performing bulk inserts in your application. Also consider the following resources: Sharded Clusters, Strategies for Bulk Inserts in Sharded Clusters, and Importing and Exporting MongoDB Data.
Indexing¶
After every insert, update, or delete operation, MongoDB must update every index associated with the collection in addition to the data itself. Therefore, every index on a collection adds some amount of overhead for the performance of write operations. [2]
In general, the performance gains that indexes provide for read operations are worth the insertion penalty; however, when optimizing write performance, be careful when creating new indexes and always evaluate the indexes on the collection and ensure that your queries are actually using these indexes.
For more information on indexes in MongoDB consider Indexes and Indexing Strategies.
[2] | The overhead for sparse indexes inserts and updates to un-indexed fields is less than for non-sparse indexes. Also for non-sparse indexes, updates that don’t change the record size have less indexing overhead. |
Isolation¶
When a single write operation modifies multiple documents, the operation as a whole is not atomic, and other operations may interleave. The modification of a single document, or record, is always atomic, even if the write operation modifies multiple sub-document within the single record.
No other operations are atomic; however, you can attempt to isolate a write operation that affects multiple documents using the isolation operator.
To isolate a sequence of write operations from other read and write operations, see Perform Two Phase Commits.
Updates¶
Each document in a MongoDB collection has allocated record space which includes the entire document and a small amount of padding. This padding makes it possible for update operations to increase the size of a document slightly without causing the document to outgrow the allocated record size.
Documents in MongoDB can grow up to the full maximum BSON
document size
. However, when documents outgrow
their allocated record size MongoDB must allocate a new record and
move the document to the new record. Update operations that do not
cause a document to grow, (i.e. in-place updates,) are significantly
more efficient than those updates that cause document growth. Use
data models that minimize the need for
document growth when possible.
For complete examples of update operations, see Update.
Padding Factor¶
If an update operation does not cause the document to increase in
size, MongoDB can apply the update in-place. Some updates
change the size of the document, for example using the
$push
operator to append a sub-document to an array can
cause the top level document to grow beyond its allocated space.
When documents grow, MongoDB relocates the document on disk with enough contiguous space to hold the document. These relocations take longer than in-place updates, particularly if the collection has indexes that MongoDB must update all index entries. If collection has many indexes, the move will impact write throughput.
To minimize document movements, MongoDB employs padding. MongoDB
adaptively learns if documents in a collection tend to grow, and if
they do, adds a paddingFactor
so that the documents have room
to grow on subsequent writes. The paddingFactor
indicates the
padding for new inserts and moves.
New in version 2.2: You can use the collMod
command
with the usePowerOf2Sizes
flag so that MongoDB
allocates document space in sizes that are powers of 2. This helps
ensure that MongoDB can efficiently reuse the space freed as a
result of deletions or document relocations. As with all padding,
using document space allocations with power of 2 sizes minimizes,
but does not eliminate, document movements.
To check the current paddingFactor
on a collection, you can
run the db.collection.stats()
operation in the
mongo
shell, as in the following example:
Since MongoDB writes each document at a different point in time, the
padding for each document will not be the same. You can calculate the
padding size by subtracting 1 from the paddingFactor
, for
example:
For example, a paddingFactor
of 1.0
specifies no padding
whereas a paddingFactor of 1.5
specifies a padding size of 0.5
or 50
percent (50%) of the document size.
Because the paddingFactor
is relative to the size of each
document, you cannot calculate the exact amount of padding for a
collection based on the average document size and padding factor.
If an update operation causes the document to decrease in size, for
instance if you perform an $unset
or a $pop
update, the document remains in place and effectively has more
padding. If the document remains this size, the space is not reclaimed
until you perform a compact
or a
repairDatabase
operation.
Note
The following operations remove padding:
compact
,repairDatabase
, and- initial replica sync operations.
However, with the compact
command, you can run the
command with a paddingFactor
or a paddingBytes
parameter.
Padding is also removed if you use mongoexport
from a
collection. If you use mongoimport
into a new
collection, mongoimport
will not add padding. If you
use mongoimport
with an existing collection with
padding, mongoimport
will not affect the existing
padding.
When a database operation removes padding, subsequent update that
require changes in record sizes will have reduced throughput until
the collection’s padding factor grows. Padding does not affect
in-place, and after compact
,
repairDatabase
, and replica set initial sync
the collection will require less storage.
Architecture¶
Replica Sets¶
In replica sets, all write operations go to the set’s primary, which applies the write operation then records the operations on the primary’s operation log or oplog. The oplog is a reproducible sequence of operations to the data set. Secondary members of the set are continuously replicating the oplog and applying the operations to themselves in an asynchronous process.
Large volumes of write operations, particularly bulk operations, may create situations where the secondary members have difficulty applying the replicating operations from the primary at a sufficient rate: this can cause the secondary’s state to fall behind that of the primary. Secondaries that are significantly behind the primary present problems for normal operation of the replica set, particularly failover in the form of rollbacks as well as general read consistency.
To help avoid this issue, you can customize the write concern to return confirmation of the write operation to another member [3] of the replica set every 100 or 1,000 operations. This provides an opportunity for secondaries to catch up with the primary. Write concern can slow the overall progress of write operations but ensure that the secondaries can maintain a largely current state with respect to the primary.
For more information on replica sets and write operations, see Write Concern, Oplog, Oplog Internals, and Changing Oplog Size.
[3] | Calling getLastError
intermittently with a w value of 2 or majority will
slow the throughput of write traffic; however, this practice will
allow the secondaries to remain current with the state of the
primary. |
Sharded Clusters¶
In a sharded cluster, MongoDB directs a given write operation to a shard and then performs the write on a particular chunk on that shard. Shards and chunks are range-based. Shard keys affect how MongoDB distributes documents among shards. Choosing the correct shard key can have a great impact on the performance, capability, and functioning of your database and cluster.
For more information, see Sharded Cluster Administration and Bulk Inserts.