Glossary¶

$cmd

A virtual collection that exposes MongoDB’s database commands.

_id

A field containing a unique ID, typically a BSON ObjectId. If not specified, this value is automatically assigned upon the creation of a new document. You can think of the _id as the document’s primary key.

accumulator

An expression in the aggregation framework that maintains state between documents in the aggregation pipeline. See: $group for a list of accumulator operations.

admin database

A privileged database named admin. Users must have access to this database to run certain administrative commands. See administrative commands for more information and Administration Commands for a list of these commands.

aggregation

Any of a variety of operations that reduce and summarize large sets of data. SQL’s GROUP and MongoDB’s map-reduce are two examples of aggregation functions.

aggregation framework

The MongoDB aggregation framework provides a means to calculate aggregate values without having to use map-reduce.

See also

Aggregation Framework.

arbiter

A member of a replica set that exists solely to vote in elections. Arbiters do not replicate data.

See also

Delayed Members

balancer

An internal MongoDB process that runs in the context of a sharded cluster and manages the migration of chunks. Administrators must disable the balancer for all maintenance operations on a sharded cluster.

box

MongoDB’s geospatial indexes and querying system allow you to build queries around rectangles on two-dimensional coordinate systems. These queries use the $box operator to define a shape using the lower-left and the upper-right coordinates.

BSON

A serialization format used to store documents and make remote procedure calls in MongoDB. “BSON” is a portmanteau of the words “binary” and “JSON”. Think of BSON as a binary representation of JSON (JavaScript Object Notation) documents. For a detailed spec, see bsonspec.org.

See also

The Data Type Fidelity section.

BSON types

The set of types supported by the BSON serialization format. The following types are available:

Type	Number
Double	1
String	2
Object	3
Array	4
Binary data	5
Object id	7
Boolean	8
Date	9
Null	10
Regular Expression	11
JavaScript	13
Symbol	14
JavaScript (with scope)	15
32-bit integer	16
Timestamp	17
64-bit integer	18
Min key	255
Max key	127

btree

A data structure used by most database management systems for to store indexes. MongoDB uses b-trees for its indexes.

CAP Theorem

Given three properties of computing systems, consistency, availability, and partition tolerance, a distributed computing system can provide any two of these features, but never all three.

capped collection

A fixed-sized collection. Once they reach their fixed size, capped collections automatically overwrite their oldest entries. MongoDB’s oplog replication mechanism depends on capped collections. Developers may also use capped collections in their applications.

See also

The Capped Collections page.

checksum

A calculated value used to ensure data integrity. The md5 algorithm is sometimes used as a checksum.

chunk

In the context of a sharded cluster, a chunk is a contiguous range of shard key values assigned to a particular shard. Chunk ranges are inclusive of the lower boundary and exclusive of the upper boundary. By default, chunks are 64 megabytes or less. When they grow beyond the configured chunk size, a mongos splits the chunk into two chunks.

circle

MongoDB’s geospatial indexes and querying system allow you to build queries around circles on two-dimensional coordinate systems. These queries use the $within operator and the $center operator to define a circle using the center and the radius of the circle.

client

The application layer that uses a database for data persistence and storage. Drivers provide the interface level between the application layer and the database server.

cluster

A set of mongod instances running in conjunction to increase database availability and performance. See sharding and replication for more information on the two different approaches to clustering with MongoDB.

collection

Collections are groupings of BSON documents. Collections do not enforce a schema, but they are otherwise mostly analogous to RDBMS tables.

The documents within a collection may not need the exact same set of fields, but typically all documents in a collection have a similar or related purpose for an application.

All collections exist within a single database. The namespace within a database for collections are flat.

See What is a namespace in MongoDB? and BSON Documents for more information.

compound index

An index consisting of two or more keys. See Indexing Overview for more information.

config database

One of three mongod instances that store all of the metadata associated with a sharded cluster.

control script

A simple shell script, typically located in the /etc/rc.d or /etc/init.d directory and used by the system’s initialization process to start, restart and stop a daemon process.

control script

A script used by a UNIX-like operating system to start, stop, or restart a daemon process. On most systems, you can find these scripts in the /etc/init.d/ or /etc/rc.d/ directories.

CRUD

Create, read, update, and delete. The fundamental operations of any database.

CSV

A text-based data format consisting of comma-separated values. This format is commonly used to exchange database between relational databases, since the format is well-suited to tabular data. You can import CSV files using mongoimport.

cursor

In MongoDB, a cursor is a pointer to the result set of a query, that clients can iterate through to retrieve results. By default, cursors will timeout after 10 minutes of inactivity.

daemon

The conventional name for a background, non-interactive process.

data-center awareness

A property that allows clients to address members in a system to based upon their location.

Replica sets implement data-center awareness using tagging. See Data Center Awareness for more information.

database

A physical container for collections. Each database gets its own set of files on the file system. A single MongoDB server typically servers multiple databases.

database command

Any MongoDB operation other than an insert, update, remove, or query. MongoDB exposes commands as queries against the special $cmd collection. For example, the implementation of count for MongoDB is a command.

See also

Database Commands Quick Reference for a full list of database commands in MongoDB

database profiler

A tool that, when enabled, keeps a record on all long-running operations in a database’s system.profile collection. The profiler is most often used to diagnose slow queries.

See also

Monitoring Database Systems.

dbpath

Refers to the location of MongoDB’s data file storage. The default dbpath is /data/db. Other common data paths include /srv/mongodb and /var/lib/mongodb.

See also

dbpath or --dbpath.

delayed member

A member of a replica set that cannot become primary and applies operations at a specified delay. This delay is useful for protecting data from human error (i.e. unintentionally deleted databases) or updates that have unforeseen effects on the production database.

See also

Delayed Members

diagnostic log

mongod can create a verbose log of operations with the mongod --diaglog option or through the diagLogging command. The mongod creates this log in the directory specified to mongod --dbpath. The name of the is diaglog.<time in hex>, where “<time-in-hex>” reflects the initiation time of logging as a hexadecimal string.

Warning

Setting the diagnostic level to 0 will cause mongod to stop writing data to the diagnostic log file. However, the mongod instance will continue to keep the file open, even if it is no longer writing data to the file. If you want to rename, move, or delete the diagnostic log you must cleanly shut down the mongod instance before doing so.

document

A record in a MongoDB collection, and the basic unit of data in MongoDB. Documents are analogous to JSON objects, but exist in the database in a more type-rich format known as BSON.

dot notation

MongoDB uses the dot notation to access the elements of an array and to access the fields of a subdocument.

To access an element of an array by the zero-based index position, you concatenate the array name with the dot (.) and zero-based index position:

copy

'<array>.<index>'

To access a field of a subdocument with dot-notation, you concatenate the subdocument name with the dot (.) and the field name:

copy

'<subdocument>.<field>'

draining

The process of removing or “shedding” chunks from one shard to another. Administrators must drain shards before removing them from the cluster.

See also

removeShard, sharding.

driver

A client implementing the communication protocol required for talking to a server. The MongoDB drivers provide language-idiomatic methods for interfacing with MongoDB.

See also

MongoDB Drivers and Client Libraries

election

In the context of replica sets, an election is the process by which members of a replica set select primaries on startup and in the event of failures.

See also

Replica Set Elections and priority.

eventual consistency

A property of a distributed system allowing changes to the system to propagate gradually. In a database system, this means that readable members are not required to reflect the latest writes at all times. In MongoDB, reads to a primary have strict consistency; reads to secondaries have eventual consistency.

expression

In the context of the aggregation framework, expressions are the stateless transformations that operate on the data that passes through the pipeline.

See also

Aggregation Framework.

failover

The process that allows one of the secondary members in a replica set to become primary in the event of a failure.

See also

Replica Set Failover.

field

A name-value pair in a document. Documents have zero or more fields. Fields are analogous to columns in relational databases.

firewall

A system level networking filter that restricts access based on, among other things, IP address. Firewalls form part of effective network security strategy.

fsync

A system call that flushes all dirty, in-memory pages to disk. MongoDB calls fsync() on its database files at least every 60 seconds.

Geohash

A value is a binary representation of the location on a coordinate grid.

geospatial

Data that relates to geographical location. In MongoDB, you may index or store geospatial data according to geographical parameters and reference specific coordinates in queries.

GridFS

A convention for storing large files in a MongoDB database. All of the official MongoDB drivers support this convention, as does the mongofiles program.

See also

mongofiles.

haystack index

In the context of geospatial queries, haystack indexes enhance searches by creating “bucket” of objects grouped by a second criterion. For example, you might want all geospatial searches to first select along a non-geospatial dimension and then match on location.

hidden member

A member of a replica set that cannot become primary and is not advertised as part of the set in the database command isMaster, which prevents it from receiving read-only queries depending on read preference.

idempotent

When calling an idempotent operation on a value or state, the operation only affects the value once. Thus, the operation can safely run multiple times without unwanted side effects. In the context of MongoDB, oplog entries must be idempotent to support initial synchronization and recovery from certain failure situations. Thus, MongoDB can safely apply oplog entries more than once without any ill effects.

index

A data structure that optimizes queries. See Indexing Overview for more information.

initial sync

The replica set operation that replicates data from an existing replica set member to a new or restored replica set member.

IPv6

A revision to the IP (Internet Protocol) standard that provides a significantly larger address space to more effectively support the number of hosts on the contemporary Internet.

ISODate

The international date format used by mongo to display dates. E.g. YYYY-MM-DD HH:MM.SS.milis.

JavaScript

A popular scripting language original designed for web browsers. The MongoDB shell and certain server-side functions use a JavaScript interpreter.

journal

A sequential, binary transaction used to bring the database into a consistent state in the event of a hard shutdown. MongoDB enables journaling by default for 64-bit builds of MongoDB version 2.0 and newer. Journal files are pre-allocated and will exist as three 1GB file in the data directory. To make journal files smaller, use smallfiles.

When enabled, MongoDB writes data first to the journal and then to the core data files. MongoDB commits to the journal within 100ms, which is configurable using the journalCommitInterval runtime option.

To force mongod to commit to the journal more frequently, you can specify j:true. When a write operation with j:true is pending, mongod will reduce journalCommitInterval to a third of the set value.

See also

The Journaling page.

JSON

JavaScript Object Notation. A human-readable, plain text format for expressing structured data with support in many programming languages.

JSON document

A JSON document is a collection of fields and values in a structured format. The following is a sample JSON document with two fields:

copy

{ name: "MongoDB",
  type: "database" }

JSONP

JSON with Padding. Refers to a method of injecting JSON into applications. Presents potential security concerns.

LVM

Logical volume manager. LVM is a program that abstracts disk images from physical devices, and provides a number of raw disk manipulation and snapshot capabilities useful for system management.

map-reduce

A data and processing and aggregation paradigm consisting of a “map” phase that selects data, and a “reduce” phase that transforms the data. In MongoDB, you can run arbitrary aggregations over data using map-reduce.

See also

The Map-Reduce page for more information regarding MongoDB’s map-reduce implementation, and Aggregation Framework for another approach to data aggregation in MongoDB.

master

In conventional master/slave replication, the master database receives all writes. The slave instances replicate from the master instance in real time.

md5

md5 is a hashing algorithm used to efficiently provide reproducible unique strings to identify and checksum data. MongoDB uses md5 to identify chunks of data for GridFS.

MIME

“Multipurpose Internet Mail Extensions.” A standard set of type and encoding definitions used to declare the encoding and type of data in multiple data storage, transmission, and email contexts.

mongo

The MongoDB Shell. mongo connects to mongod and mongos instances, allowing administration, management, and testing. mongo has a JavaScript interface.

mongod

The program implementing the MongoDB database server. This server typically runs as a daemon.

See also

mongod.

MongoDB

The document-based database server described in this manual.

mongos

The routing and load balancing process that acts an interface between an application and a MongoDB sharded cluster.

See also

mongos.

multi-master replication

A replication method where multiple database instances can accept write operations to the same data set at any time. Multi-master replication exchanges increased concurrency and availability for a relaxed consistency semantic. MongoDB ensures consistency and, therefore, does not provide multi-master replication.

namespace

The canonical name for a collection or index in MongoDB. The namespace is a combination of the database name and the name of the collection or index, like so: [database-name].[collection-or-index-name]. All documents belong to a namespace.

natural order

The order in which a database stores documents on disk. Typically, the order of documents on disks reflects insertion order, except when documents move internal because of document growth due to update operations. However, Capped collections guarantee that insertion order and natural order are identical.

When you execute find() with no parameters, the database returns documents in forward natural order. When you execute find() and include sort() with a parameter of $natural:-1, the database returns documents in reverse natural order.

ObjectId

A special 12-byte BSON type that has a high probability an ObjectId represent the time of the ObjectId’s creation. MongoDB uses ObjectId values as the default values for _id fields.

operator

A keyword beginning with a $ used to express a complex query, update, or data transformation. For example, $gt is the query language’s “greater than” operator. See the Query, Update, and Projection Operators Quick Reference for more information about the available operators.

oplog

A capped collection that stores an ordered history of logical writes to a MongoDB database. The oplog is the basic mechanism enabling replication in MongoDB.

ordered query plan

Query plan that returns results in the order consistent with the sort() order.

See also

Query Optimization

padding

The extra space allocated to document on the disk to prevent moving a document when it grows as the result of update() operations.

padding factor

An automatically-calibrated constant used to determine how much extra space MongoDB should allocate per document container on disk. A padding factor of 1 means that MongoDB will allocate only the amount of space needed for the document. A padding factor of 2 means that MongoDB will allocate twice the amount of space required by the document.

page fault

The event that occurs when a process requests stored data (i.e. a page) from memory that the operating system has moved to disk.

See also

Storage FAQ: What are page faults?

partition

A distributed system architecture that splits data into ranges. Sharding is a kind of partitioning.

pcap

A packet capture format used by mongosniff to record packets captured from network interfaces and display them as human-readable MongoDB operations.

PID

A process identifier. On UNIX-like systems, a unique integer PID is assigned to each running process. You can use a PID to inspect a running process and send signals to it.

pipe

A communication channel in UNIX-like systems allowing independent processes to send and receive data. In the UNIX shell, piped operations allow users to direct the output of one command into the input of another.

pipeline

The series of operations in the aggregation process.

See also

Aggregation Framework.

polygon

MongoDB’s geospatial indexes and querying system allow you to build queries around multi-sided polygons on two-dimensional coordinate systems. These queries use the $within operator and a sequence of points that define the corners of the polygon.

powerOf2Sizes

A per-collection setting that changes and normalizes the way that MongoDB allocates space for each document in an effort to maximize storage reuse reduce fragmentation. This is the default for TTL Collections. See collMod and usePowerOf2Sizes for more information.

New in version 2.2.

pre-splitting

An operation, performed before inserting data that divides the range of possible shard key values into chunks to facilitate easy insertion and high write throughput. When deploying a sharded cluster, in some cases pre-splitting will expedite the initial distribution of documents among shards by manually dividing the collection into chunks rather than waiting for the MongoDB balancer to create chunks during the course of normal operation.

primary

In a replica set, the primary member is the current master instance, which receives all write operations.

primary key

A record’s unique, immutable identifier. In an RDBMS, the primary key is typically an integer stored in each row’s id field. In MongoDB, the _id field holds a document’s primary key which is usually a BSON ObjectId.

primary shard

For a database where sharding is enabled, the primary shard holds all un-sharded collections.

priority

In the context of replica sets, priority is a configurable value that helps determine which members in a replica set are most likely to become primary.

See also

Replica Set Member Priority

projection

A document given to a query that specifies which fields MongoDB will return from the documents in the result set.

query

A read request. MongoDB queries use a JSON-like query language that includes a variety of query operators with names that begin with a $ character. In the mongo shell, you can issue queries using the db.collection.find() and db.collection.findOne() methods.

query optimizer

For each query, the MongoDB query optimizer generates a query plan that matches the query to the index that produces the fastest results. The optimizer then uses the query plan each time the mongod receives the query. If a collection changes significantly, the optimizer creates a new query plan.

See also

Query Optimization

RDBMS

Relational Database Management System. A database management system based on the relational model, typically using SQL as the query language.

read preference

A setting on the MongoDB drivers that determines how the clients direct read operations. Read preference affects all replica sets including shards. By default, drivers direct all reads to primaries for strict consistency. However, you may also direct reads to secondaries for eventually consistent reads.

See also

Read Preference

read-lock

In the context of a reader-writer lock, a lock that while held allows concurrent readers, but no writers.

record size

The space allocated for a document including the padding.

recovering

A replica set member status indicating that a member is not ready to begin normal activities of a secondary or primary. Recovering members are unavailable for reads.

replica pairs

The precursor to the MongoDB replica sets.

Deprecated since version 1.6.

replica set

A cluster of MongoDB servers that implements master-slave replication and automated failover. MongoDB’s recommended replication strategy.

replication

A feature allowing multiple database servers to share the same data, thereby ensuring redundancy and facilitating load balancing. MongoDB supports two flavors of replication: master-slave replication and replica sets.

replication lag

The length of time between the last operation in the primary’s oplog last operation applied to a particular secondary or slave. In general, you want to keep replication lag as small as possible.

See also

Replication Lag

resident memory

The subset of an application’s memory currently stored in physical RAM. Resident memory is a subset of virtual memory, which includes memory mapped to physical RAM and to disk.

REST

An API design pattern centered around the idea of resources and the CRUD operations that apply to them. Typically implemented over HTTP. MongoDB provides a simple HTTP REST interface that allows HTTP clients to run commands against the server.

rollback

A process that, in certain replica set situations, reverts writes operations to ensure the consistency of all replica set members.

secondary

In a replica set, the secondary members are the current slave instances that replicate the contents of the master database. Secondary members may handle read requests, but only the primary members can handle write operations.

secondary index

A database index that improves query performance by minimizing the amount of work that the query engine must perform to fulfill a query.

set name

In the context of a replica set, the set name refers to an arbitrary name given to a replica set when it’s first configured. All members of a replica set must have the same name specified with the replSet setting (or --replSet option for mongod.)

shard

A single replica set that stores some portion of a sharded cluster’s total data set. See sharding.

See also

The documents in the Sharding section of manual.

shard key

In a sharded collection, a shard key is the field that MongoDB uses to distribute documents among members of the sharded cluster.

sharded cluster

The set of nodes comprising a sharded MongoDB deployment. A sharded cluster consists of three config processes, one or more replica sets, and one or more mongos routing processes.

See also

The documents in the Sharding section of manual.

sharding

A database architecture that enable horizontal scaling by splitting data into key ranges among two or more replica sets. This architecture is also known as “range-based partitioning.” See shard.

See also

The documents in the Sharding section of manual.

shell helper

A number of database commands have “helper” methods in the mongo shell that provide a more concise syntax and improve the general interactive experience.

single-master replication

A replication topology where only a single database instance accepts writes. Single-master replication ensures consistency and is the replication topology employed by MongoDB.

slave

In conventional master/slave replication, slaves are read-only instances that replicate operations from the master database. Data read from slave instances may not be completely consistent with the master. Therefore, applications requiring consistent reads must read from the master database instance.

split

The division between chunks in a sharded cluster.

SQL

Structured Query Language (SQL) is a common special-purpose programming language used for interaction with a relational database including access control as well as inserting, updating, querying, and deleting data. There are some similar elements in the basic SQL syntax supported by different database vendors, but most implementations have their own dialects, data types, and interpretations of proposed SQL standards. Complex SQL is generally not directly portable between major RDBMS products. SQL is often used as metonym for relational databases.

SSD

Solid State Disk. A high-performance disk drive that uses solid state electronics for persistence, as opposed to the rotating platters and movable read/write heads used by traditional mechanical hard drives.

standalone

In MongoDB, a standalone is an instance of mongod that is running as a single server and not as part of a replica set.

strict consistency

A property of a distributed system requiring that all members always reflect the latest changes to the system. In a database system, this means that any system that can provide data must reflect the latest writes at all times. In MongoDB, reads to a primary have strict consistency; reads to secondary members have eventual consistency.

sync

The replica set operation where members replicate data from the primary. Replica sets synchronize data at two different points:

Initial sync occurs when MongoDB creates new databases on a new or restored replica set member, populating the the member with the replica set’s data.
“Replication” occurs continually after initial sync and keeps the member updated with changes to the replica set’s data.

syslog

On UNIX-like systems, a logging process that provides a uniform standard for servers and processes to submit logging information.

tag

One or more labels applied to a given replica set member that clients may use to issue data-center aware operations.

TSV

A text-based data format consisting of tab-separated values. This format is commonly used to exchange database between relational databases, since the format is well-suited to tabular data. You can import TSV files using mongoimport.

TTL

Stands for “time to live,” and represents an expiration time or period for a given piece of information to remain in a cache or other temporary storage system before the system deletes it or ages it out.

unique index

An index that enforces uniqueness for a particular field across a single collection.

unordered query plan

Query plan that returns results in an order inconsistent with the sort() order.

See also

Query Optimization

upsert

A kind of update that either updates the first document matched in the provided query selector or, if no document matches, inserts a new document having the fields implied by the query selector and the update operation.

virtual memory

An application’s working memory, typically residing on both disk an in physical RAM.

working set

The collection of data that MongoDB uses regularly. This data is typically (or preferably) held in RAM.

write concern

Specifies whether a write operation has succeeded. Write concern allows your application to detect insertion errors or unavailable mongod instances. For replica sets, you can configure write concern to confirm replication to a specified number of members.

write-lock

A lock on the database for a given writer. When a process writes to the database, it takes an exclusive write-lock to prevent other processes from writing or reading.

writeBacks

The process within the sharding system that ensures that writes issued to a shard that isn’t responsible for the relevant chunk, get applied to the proper shard.

← MongoDB Extended JSON Release Notes for MongoDB 2.2 →