- Core MongoDB Operations (CRUD) >
- Read Operations
Read Operations¶
On this page
Read operations include all operations that return a cursor in
response to application request data (i.e. queries,) and also
include a number of aggregation operations that
do not return a cursor but have similar properties as queries. These
commands include aggregate
, count
, and
distinct
.
This document describes the syntax and structure of the queries applications use to request data from MongoDB and how different factors affect the efficiency of reads.
Note
All of the examples in this document use the mongo
shell interface. All of these operations are available in an
idiomatic interface for each language by way of the MongoDB
Driver. See your driver documentation for
full API documentation.
Queries in MongoDB¶
In the mongo
shell, the find()
and findOne()
methods perform read operations. The
find()
method has the following
syntax: [1]
The
db.collection
object specifies the database and collection to query. All queries in MongoDB address a single collection.You can enter
db
in themongo
shell to return the name of the current database. Use theshow collections
operation in themongo
shell to list the current collections in the database.Queries in MongoDB are BSON objects that use a set of query operators to describe query parameters.
The
<query>
argument of thefind()
method holds this query document. A read operation without a query document will return all documents in the collection.The
<projection>
argument describes the result set in the form of a document. Projections specify or limit the fields to return.Without a projection, the operation will return all fields of the documents. Specify a projection if your documents are larger, or when your application only needs a subset of available fields.
The order of documents returned by a query is not defined and is not necessarily consistent unless you specify a sort (
sort()
).
For example, the following operation on the inventory
collection
selects all documents where the type
field equals 'food'
and
the price
field has a value less than 9.95
. The projection
limits the response to the item
and qty
, and _id
field:
The findOne()
method is similar to
the find()
method except the
findOne()
method returns a single
document from a collection rather than a cursor. The method has the
syntax:
For additional documentation and examples of the main MongoDB read operators, refer to the Read page of the Core MongoDB Operations (CRUD) section.
[1] | db.collection.find() is a
wrapper for the more formal query structure with the
$query operator. |
Query Document¶
This section provides an overview of the query document for MongoDB queries. See the preceding section for more information on queries in MongoDB.
The following examples demonstrate the key properties of the query
document in MongoDB queries, using the find()
method from the mongo
shell, and a
collection of documents named inventory
:
An empty query document (
{}
) selects all documents in the collection:Not specifying a query document to the
find()
is equivalent to specifying an empty query document. Therefore the following operation is equivalent to the previous operation:A single-clause query selects all documents in a collection where a field has a certain value. These are simple “equality” queries.
In the following example, the query selects all documents in the collection where the
type
field has the valuesnacks
:A single-clause query document can also select all documents in a collection given a condition or set of conditions for one field in the collection’s documents. Use the query operators to specify conditions in a MongoDB query.
In the following example, the query selects all documents in the collection where the value of the
type
field is either'food'
or'snacks'
:A compound query can specify conditions for more than one field in the collection’s documents. Implicitly, a logical
AND
conjunction connects the clauses of a compound query so that the query selects the documents in the collection that match all the conditions.In the following example, the query document specifies an equality match on a single field, followed by a range of values for a second field using a comparison operator:
This query selects all documents where the
type
field has the value'food'
and the value of theprice
field is less than ($lt
)9.95
.Using the
$or
operator, you can specify a compound query that joins each clause with a logicalOR
conjunction so that the query selects the documents in the collection that match at least one condition.In the following example, the query document selects all documents in the collection where the field
qty
has a value greater than ($gt
)100
or the value of theprice
field is less than ($lt
)9.95
:With additional clauses, you can specify precise conditions for matching documents. In the following example, the compound query document selects all documents in the collection where the value of the
type
field is'food'
and either theqty
has a value greater than ($gt
)100
or the value of theprice
field is less than ($lt
)9.95
:
Subdocuments¶
When the field holds an embedded document (i.e. subdocument), you can either specify the entire subdocument as the value of a field, or “reach into” the subdocument using dot notation, to specify values for individual fields in the subdocument:
Equality matches within subdocuments select documents if the subdocument matches exactly the specified subdocument, including the field order.
In the following example, the query matches all documents where the value of the field
producer
is a subdocument that contains only the fieldcompany
with the value'ABC123'
and the fieldaddress
with the value'123 Street'
, in the exact order:Equality matches for specific fields within subdocuments select documents when the field in the subdocument contains a field that matches the specified value.
In the following example, the query uses the dot notation to match all documents where the value of the field
producer
is a subdocument that contains a fieldcompany
with the value'ABC123'
and may contain other fields:
Arrays¶
When the field holds an array, you can query for values in the array, and if the array holds sub-documents, you query for specific fields within the sub-documents using dot notation:
Equality matches can specify an entire array, to select an array that matches exactly. In the following example, the query matches all documents where the value of the field
tags
is an array and holds three elements,'fruit'
,'food'
, and'citrus'
, in this order:Equality matches can specify a single element in the array. If the array contains at least one element with the specified value, as in the following example: the query matches all documents where the value of the field
tags
is an array that contains, as one of its elements, the element'fruit'
:Equality matches can also select documents by values in an array using the array index (i.e. position) of the element in the array, as in the following example: the query uses the dot notation to match all documents where the value of the
tags
field is an array whose first element equals'fruit'
:
In the following examples, consider an array that contains subdocuments:
If you know the array index of the subdocument, you can specify the document using the subdocument’s position.
The following example selects all documents where the
memos
contains an array whose first element (i.e. index is0
) is a subdocument with the fieldby
with the value'shipping'
:If you do not know the index position of the subdocument, concatenate the name of the field that contains the array, with a dot (
.
) and the name of the field in the subdocument.The following example selects all documents where the
memos
field contains an array that contains at least one subdocument with the fieldby
with the value'shipping'
:To match by multiple fields in the subdocument, you can use either dot notation or the
$elemMatch
operator:The following example uses dot notation to query for documents where the value of the
memos
field is an array that has at least one subdocument that contains the fieldmemo
equal to'on time'
and the fieldby
equal to'shipping'
:The following example uses
$elemMatch
to query for documents where the value of thememos
field is an array that has at least one subdocument that contains the fieldmemo
equal to'on time'
and the fieldby
equal to'shipping'
:
Refer to the Query, Update, Projection, and Aggregation Operators document for the complete list of query operators.
Result Projections¶
The projection specification limits the fields to return for all matching documents. Restricting the fields to return can minimize network transit costs and the costs of deserializing documents in the application layer.
The second argument to the find()
method is a projection, and it takes the form of a document with
a list of fields for inclusion or exclusion from the result set. You
can either specify the fields to include (e.g. { field: 1 }
) or specify the
fields to exclude (e.g. { field: 0 }
). The _id
field is implicitly
included, unless explicitly excluded.
Note
You cannot combine inclusion and exclusion semantics in a single
projection with the exception of the _id
field.
Consider the following projection specifications in find()
operations:
If you specify no projection, the
find()
method returns all fields of all documents that match the query.This operation will return all documents in the
inventory
collection where the value of thetype
field is'food'
.A projection can explicitly include several fields. In the following operation,
find()
method returns all documents that match the query as well asitem
andqty
fields. The results also include the_id
field:You can remove the
_id
field by excluding it from the projection, as in the following example:This operation returns all documents that match the query, and only includes the
item
andqty
fields in the result set.To exclude a single field or group of fields you can use a projection in the following form:
This operation returns all documents where the value of the
type
field isfood
, but does not include thetype
field in the output.With the exception of the
_id
field you cannot combine inclusion and exclusion statements in projection documents.
The $elemMatch
and $slice
projection
operators provide more control when projecting only a portion of an
array.
Indexes¶
Indexes improve the efficiency of read operations by reducing the amount of data that query operations need to process and thereby simplifying the work associated with fulfilling queries within MongoDB. The indexes themselves are a special data structure that MongoDB maintains when inserting or modifying documents, and any given index can: support and optimize specific queries, sort operations, and allow for more efficient storage utilization. For more information about indexes in MongoDB see: Indexes and Indexing Overview.
You can create indexes using the db.collection.ensureIndex()
method
in the mongo
shell, as in the following prototype
operation:
The
field
specifies the field to index. The field may be a field from a subdocument, using dot notation to specify subdocument fields.You can create an index on a single field or a compound index that includes multiple fields in the index.
The
order
option is specifies either ascending (1
) or descending (-1
).MongoDB can read the index in either direction. In most cases, you only need to specify indexing order to support sort operations in compound queries.
Covering a Query¶
An index covers a query, a covered query, when:
- all the fields in the query are part of that index, and
- all the fields returned in the documents that match the query are in the same index.
For these queries, MongoDB does not need to inspect at documents outside of the index, which is often more efficient than inspecting entire documents.
Example
Given a collection inventory
with the following index on the
type
and item
fields:
This index will cover the following query on the type
and item
fields, which returns only the item
field:
However, this index will not cover the following query, which
returns the item
field and the _id
field:
See Create Indexes that Support Covered Queries for more information on the behavior and use of covered queries.
Measuring Index Use¶
The explain()
cursor method allows you to
inspect the operation of the query system, and is useful for analyzing
the efficiency of queries, and for determining how the query uses the
index. Call the explain()
method on a
cursor returned by find()
, as in the
following example:
Note
Only use explain()
to test the query
operation, and not the timing of query performance. Because
explain()
attempts multiple query
plans, it does not reflect accurate query performance.
If the above operation could not use an index, the output of
explain()
would resemble the following:
The BasicCursor
value in the cursor
field confirms that
this query does not use an index. The explain.nscannedObjects
value shows that
MongoDB must scan 4,000,006 documents to return only 5 documents. To
increase the efficiency of the query, create an index on the type
field, as in the following example:
Run the explain()
operation, as follows,
to test the use of the index:
Consider the results:
The BtreeCursor
value of the cursor
field indicates that
the query used an index. This query:
returned 5 documents, as indicated by the
n
field;scanned 5 documents from the index, as indicated by the
nscanned
field;then read 5 full documents from the collection, as indicated by the
nscannedObjects
field.Although the query uses an index to find the matching documents, if
indexOnly
is false then an index could not cover the query: MongoDB could not both match the query conditions and return the results using only this index. See Create Indexes that Support Covered Queries for more information.
Query Optimization¶
The MongoDB query optimizer processes queries and chooses the most efficient query plan for a query given the available indexes. The query system then uses this query plan each time the query runs. The query optimizer occasionally reevaluates query plans as the content of the collection changes to ensure optimal query plans.
To create a new query plan, the query optimizer:
runs the query against several candidate indexes in parallel.
records the matches in a common results buffer or buffers.
- If the candidate plans include only ordered query plans, there is a single common results buffer.
- If the candidate plans include only unordered query plans, there is a single common results buffer.
- If the candidate plans include both ordered query plans and unordered query plans, there are two common results buffers, one for the ordered plans and the other for the unordered plans.
If an index returns a result already returned by another index, the optimizer skips the duplicate match. In the case of the two buffers, both buffers are de-duped.
stops the testing of candidate plans and selects an index when one of the following events occur:
- An unordered query plan has returned all the matching results; or
- An ordered query plan has returned all the matching results; or
- An ordered query plan has returned a threshold number of matching results:
- Version 2.0: Threshold is the query batch size. The default batch size is 101.
- Version 2.2: Threshold is 101.
The selected index becomes the index specified in the query plan; future iterations of this query or queries with the same query pattern will use this index. Query pattern refers to query select conditions that differ only in the values, as in the following two queries with the same query pattern:
To manually compare the performance of a query using more than one
index, you can use the hint()
and
explain()
methods in conjunction, as in
the following prototype:
The following operations each run the same query but will reflect the use of the different indexes:
This returns the statistics regarding the execution of the query. For
more information on the output of explain()
, see the Explain Output.
Note
If you run explain()
without including
hint()
, the query optimizer reevaluates
the query and runs against multiple indexes before returning the
query statistics.
As collections change over time, the query optimizer deletes a query plan and reevaluates the after any of the following events:
- the collection receives 1,000 write operations.
- the
reIndex
rebuilds the index. - you add or drop an index.
- the
mongod
process restarts.
For more information, see Indexing Strategies.
Query Operations that Cannot Use Indexes Effectively¶
Some query operations cannot use indexes effectively or cannot use indexes at all. Consider the following situations:
The inequality operators
$nin
and$ne
are not very selective, as they often match a large portion of the index.As a result, in most cases, a
$nin
or$ne
query with an index may perform no better than a$nin
or$ne
query that must scan all documents in a collection.Queries that specify regular expressions, with inline JavaScript regular expressions or
$regex
operator expressions, cannot use an index. However, the regular expression with anchors to the beginning of a string can use an index.
Cursors¶
The find()
method returns a
cursor to the results; however, in the mongo
shell,
if the returned cursor is not assigned to a variable, then the cursor
is automatically iterated up to 20 times [2] to print
up to the first 20 documents that match the query, as in the following
example:
When you assign the find()
to a
variable:
you can call the cursor variable in the shell to iterate up to 20 times [2] and print the matching documents, as in the following example:
you can use the cursor method
next()
to access the documents, as in the following example:As an alternative print operation, consider the
printjson()
helper method to replaceprint(tojson())
:you can use the cursor method
forEach()
to iterate the cursor and access the documents, as in the following example:
See JavaScript cursor methods and your driver documentation for more information on cursor methods.
[2] | (1, 2) You can use the DBQuery.shellBatchSize to
change the number of iteration from the default value 20 . See
Executing Queries for more information. |
Iterator Index¶
In the mongo
shell, you can use the
toArray()
method to iterate the cursor and return
the documents in an array, as in the following:
The toArray()
method loads into RAM all
documents returned by the cursor; the toArray()
method exhausts the cursor.
Additionally, some drivers provide
access to the documents by using an index on the cursor (i.e.
cursor[index]
). This is a shortcut for first calling the
toArray()
method and then using an index
on the resulting array.
Consider the following example:
The myCursor[3]
is equivalent to the following example:
Cursor Behaviors¶
Consider the following behaviors related to cursors:
By default, the server will automatically close the cursor after 10 minutes of inactivity or if client has exhausted the cursor. To override this behavior, you can specify the
noTimeout
:meta-driver:`wire protocol flag </legacy/mongodb-wire-protocol>` in your query; however, you should either close the cursor manually or exhaust the cursor. In themongo
shell, you can set thenoTimeout
flag:See your driver documentation for information on setting the
noTimeout
flag. See Cursor Flags for a complete list of available cursor flags.Because the cursor is not isolated during its lifetime, intervening write operations may result in a cursor that returns a single document [3] more than once. To handle this situation, see the information on snapshot mode.
The MongoDB server returns the query results in batches:
For most queries, the first batch returns 101 documents or just enough documents to exceed 1 megabyte. Subsequent batch size is 4 megabytes. To override the default size of the batch, see
batchSize()
andlimit()
.For queries that include a sort operation without an index, the server must load all the documents in memory to perform the sort and will return all documents in the first batch.
Batch size will not exceed the maximum BSON document size.
As you iterate through the cursor and reach the end of the returned batch, if there are more results,
cursor.next()
will perform agetmore operation
to retrieve the next batch.To see how many documents remain in the batch as you iterate the cursor, you can use the
objsLeftInBatch()
method, as in the following example:
You can use the command
cursorInfo
to retrieve the following information on cursors:- total number of open cursors
- size of the client cursors in current use
- number of timed out cursors since the last server restart
Consider the following example:
The result from the command returns the following documentation:
[3] | A single document relative to value of the
_id field. A cursor cannot return the same document more than
once if the document has not changed. |
Cursor Flags¶
The mongo
shell provides the following cursor flags:
DBQuery.Option.tailable
DBQuery.Option.slaveOk
DBQuery.Option.oplogReplay
DBQuery.Option.noTimeout
DBQuery.Option.awaitData
DBQuery.Option.exhaust
DBQuery.Option.partial
Aggregation¶
Changed in version 2.2.
MongoDB can perform some basic data aggregation operations on results before returning data to the application. These operations are not queries; they use database commands rather than queries, and they do not return a cursor. However, they still require MongoDB to read data.
Running aggregation operations on the database side can be more efficient than running them in the application layer and can reduce the amount of data MongoDB needs to send to the application. These aggregation operations include basic grouping, counting, and even processing data using a map reduce framework. Additionally, in 2.2 MongoDB provides a complete aggregation framework for more rich aggregation operations.
The aggregation framework provides users with a “pipeline” like
framework: documents enter from a collection and then pass through a
series of steps by a sequence of pipeline operators that manipulate and
transform the documents until they’re output at the end. The
aggregation framework is accessible via the aggregate
command or the db.collection.aggregate()
helper in the
mongo
shell.
For more information on the aggregation framework see Aggregation.
Additionally, MongoDB provides a number of simple data aggregation operations for more basic data aggregation operations:
count
(count()
)distinct
(db.collection.distinct()
)group
(db.collection.group()
)mapReduce
. (Also considermapReduce()
and Map-Reduce.)
Architecture¶
Read Operations from Sharded Clusters¶
Sharded clusters allow you to partition a
data set among a cluster of mongod
in a way that is nearly
transparent to the application. See the Sharding section of
this manual for additional information about these deployments.
For a sharded cluster, you issue all operations to one of the
mongos
instances associated with the
cluster. mongos
instances route operations to the
mongod
in the cluster and behave like mongod
instances to the application. Read operations to a sharded collection
in a sharded cluster are largely the same as operations to a replica
set or standalone instances. See the section on Read
Operations in Sharded Clusters for more
information.
In sharded deployments, the mongos
instance routes
the queries from the clients to the mongod
instances that
hold the data, using the cluster metadata stored in the config
database.
For sharded collections, if queries do not include the shard key, the mongos
must direct the query to
all shards in a collection. These scatter gather queries can be
inefficient, particularly on larger clusters, and are unfeasible for
routine operations.
For more information on read operations in sharded clusters, consider the following resources:
Read Operations from Replica Sets¶
Replica sets use read preferences to determine where and how to route read operations to members of the replica set. By default, MongoDB always reads data from a replica set’s primary. You can modify that behavior by changing the read preference mode.
You can configure the read preference mode on a per-connection or per-operation basis to allow reads from secondaries to:
- reduce latency in multi-data-center deployments,
- improve read throughput by distributing high read-volumes (relative to write volume),
- for backup operations, and/or
- to allow reads during failover situations.
Read operations from secondary members of replica sets are not guaranteed to reflect the current state of the primary, and the state of secondaries will trail the primary by some amount of time. Often, applications don’t rely on this kind of strict consistency, but application developers should always consider the needs of their application before setting read preference.
For more information on read preferences or on the read preference modes, see Read Preference and Read Preference Modes.