- Frequently Asked Questions >
- FAQ: MongoDB Storage
FAQ: MongoDB Storage¶
On this page
- What are memory mapped files?
- How do memory mapped files work?
- How does MongoDB work with memory mapped files?
- What are page faults?
- What is the difference between soft and hard page faults?
- What tools can I use to investigate storage use in MongoDB?
- What is the working set?
- Why are the files in my data directory larger than the data in my database?
- How do I reclaim disk space?
- How can I check the size of a collection?
- How can I check the size of indexes?
- How do I know when the server runs out of disk space?
This document addresses common questions regarding MongoDB’s storage system.
If you don’t find the answer you’re looking for, check the complete list of FAQs or post your question to the MongoDB User Mailing List.
What are memory mapped files?¶
A memory-mapped file is a file with data that the operating system
places in memory by way of the mmap()
system call. mmap()
thus
maps the file to a region of virtual memory. Memory-mapped files are
the critical piece of the storage engine in MongoDB. By using memory
mapped files MongoDB can treat the contents of its data files as if
they were in memory. This provides MongoDB with an extremely fast and
simple method for accessing and manipulating data.
How do memory mapped files work?¶
Memory mapping assigns files to a block of virtual memory with a direct byte-for-byte correlation. Once mapped, the relationship between file and memory allows MongoDB to interact with the data in the file as if it were memory.
How does MongoDB work with memory mapped files?¶
MongoDB uses memory mapped files for managing and interacting with all data. MongoDB memory maps data files to memory as it accesses documents. Data that isn’t accessed is not mapped to memory.
What are page faults?¶
Page faults can occur as MongoDB reads from or writes data to parts of its data files that are not currently located in physical memory. In contrast, operating system page faults happen when physical memory is exhausted and pages of physical memory are swapped to disk.
If there is free memory, then the operating system can find the page on disk and load it to memory directly. However, if there is no free memory, the operating system must:
- find a page in memory that is stale or no longer needed, and write the page to disk.
- read the requested page from disk and load it into memory.
This process, particularly on an active system can take a long time, particularly in comparison to reading a page that is already in memory.
See Page Faults for more information.
What is the difference between soft and hard page faults?¶
Page faults occur when MongoDB needs access to data that isn’t currently in active memory. A “hard” page fault refers to situations when MongoDB must access a disk to access the data. A “soft” page fault, by contrast, merely moves memory pages from one list to another, such as from an operating system file cache. In production, MongoDB will rarely encounter soft page faults.
See Page Faults for more information.
What tools can I use to investigate storage use in MongoDB?¶
The db.stats()
method in the mongo
shell,
returns the current state of the “active” database. The
dbStats command document describes
the fields in the db.stats()
output.
What is the working set?¶
Working set represents the total body of data that the application uses in the course of normal operation. Often this is a subset of the total data size, but the specific size of the working set depends on actual moment-to-moment use of the database.
If you run a query that requires MongoDB to scan every document in a collection, the working set will expand to include every document. Depending on physical memory size, this may cause documents in the working set to “page out,” or to be removed from physical memory by the operating system. The next time MongoDB needs to access these documents, MongoDB may incur a hard page fault.
If you run a query that requires MongoDB to scan every document in a collection, the working set includes every active document in memory.
For best performance, the majority of your active set should fit in RAM.
Why are the files in my data directory larger than the data in my database?¶
The data files in your data directory, which is the /data/db
directory in default configurations, might be larger than the data set
inserted into the database. Consider the following possible causes:
Preallocated data files¶
In the data directory, MongoDB preallocates data files to a particular
size, in part to prevent file system fragmentation. MongoDB names the
first data file <databasename>.0
, the next <databasename>.1
,
etc. The first file mongod
allocates is 64 megabytes, the
next 128 megabytes, and so on, up to 2 gigabytes, at which point all
subsequent files are 2 gigabytes. The data files include files with
allocated space but that hold no data. mongod
may allocate a
1 gigabyte data file that may be 90% empty. For most larger databases,
unused allocated space is small compared to the database.
The oplog
¶
If this mongod
is a member of a replica set, the data
directory includes the oplog.rs file, which is a
preallocated capped collection in the local
database.
The default allocation is approximately 5% of disk space on 64-bit installations. In most cases, you should not need to resize the oplog. See Oplog Sizing for more information
The journal
¶
The data directory contains the journal files, which store write operations on disk before MongoDB applies them to databases. See Journaling Mechanics.
Empty records¶
MongoDB maintains lists of empty records in data files as it deletes documents and collections. MongoDB can reuse this space, but will not, by default, return this space to the operating system.
To allow MongoDB to more effectively reuse the space, you can
de-fragment your data. To de-fragment, use the compact
command. The compact
requires up to 2 gigabytes of extra
disk space to run. Do not use compact
if you are
critically low on disk space. For more information on its behavior and
other considerations, see compact
.
compact
only removes fragmentation from MongoDB data files
within a collection and does not return any disk space to the operating
system. To return disk space to the operating system, see
How do I reclaim disk space?.
How do I reclaim disk space?¶
The following provides some options to consider when reclaiming disk space.
Note
You do not need to reclaim disk space for MongoDB to reuse freed space. See Empty records for information on reuse of freed space.
repairDatabase
¶
You can use repairDatabase
on a database to rebuilds the
database, de-fragmenting the associated storage in the process.
repairDatabase
requires free disk space equal to the size
of your current data set plus 2 gigabytes. If the volume that holds
dbpath lacks sufficient space, you can mount a separate volume and use
that for the repair. For additional information and considerations, see
repairDatabase
.
Warning
Do not use repairDatabase
if you
are critically low on disk space.
repairDatabase
will block all other operations and may
take a long time to complete.
You can only run repairDatabase
on a standalone
mongod
instance.
You can also run the repairDatabase
operation for all
databases on the server by restarting your mongod
standalone
instance with the --repair
and --repairpath
options. All
databases on the server will be unavailable during this operation.
Resync the Member of the Replica Set¶
For a secondary member of a replica set, you can perform a resync of the member by: stopping the secondary member to resync, deleting all data and subdirectories from the member’s data directory, and restarting.
For details, see Resync a Member of a Replica Set.
How can I check the size of a collection?¶
To view the size of a collection and other information, use the
db.collection.stats()
method from the mongo
shell.
The following example issues db.collection.stats()
for the
orders
collection:
To view specific measures of size, use these methods:
db.collection.dataSize()
: data size in bytes for the collection.db.collection.storageSize()
: allocation size in bytes, including unused space.db.collection.totalSize()
: the data size plus the index size in bytes.db.collection.totalIndexSize()
: the index size in bytes.
Also, the following scripts print the statistics for each database and collection:
How can I check the size of indexes?¶
To view the size of the data allocated for an index, use one of the
following procedures in the mongo
shell:
Use the
db.collection.stats()
method using the index namespace. To retrieve a list of namespaces, issue the following command:Check the value of
indexSizes
in the output of thedb.collection.stats()
command.
Example
Issue the following command to retrieve index namespaces:
The command returns a list similar to the following:
View the size of the data allocated for the orders.$_id_
index
with the following sequence of operations:
How do I know when the server runs out of disk space?¶
If your server runs out of disk space for data files, you will see something like this in the log:
The server remains in this state forever, blocking all writes including
deletes. However, reads still work. To delete some data and compact,
using the compact
command, you must restart the server
first.
If your server runs out of disk space for journal files, the server
process will exit. By default, mongod
creates journal files
in a sub-directory of dbPath
named journal
. You may
elect to put the journal files on another storage device using a
filesystem mount or a symlink.
Note
If you place the journal files on a separate storage device you will not be able to use a file system snapshot tool to capture a valid snapshot of your data files and journal files.