Navigation
This version of the documentation is archived and no longer supported.

Aggregation Framework Reference

New in version 2.1.0.

On this page

The aggregation framework provides the ability to project, process, and/or control the output of the query, without using map-reduce. Aggregation uses a syntax that resembles the same syntax and form as “regular” MongoDB database queries.

These aggregation operations are all accessible by way of the aggregate() method. While all examples in this document use this method, aggregate() is merely a wrapper around the database command aggregate. The following prototype aggregation operations are equivalent:

db.people.aggregate( <pipeline> )
db.people.aggregate( [<pipeline>] )
db.runCommand( { aggregate: "people", pipeline: [<pipeline>] } )

These operations perform aggregation routines on the collection named people. <pipeline> is a placeholder for the aggregation pipeline definition. aggregate() accepts the stages of the pipeline (i.e. <pipeline>) as an array, or as arguments to the method.

This documentation provides an overview of all aggregation operators available for use in the aggregation pipeline as well as details regarding their use and behavior.

See also

Aggregation Framework overview, the Aggregation Framework Documentation Index, and the Aggregation Framework Examples for more information on the aggregation functionality.

Pipeline

Warning

The pipeline cannot operate on values of the following types: Binary, Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope.

Pipeline operators appear in an array. Conceptually, documents pass through these operators in a sequence. All examples in this section assume that the aggregation pipeline begins with a collection named article that contains documents that resemble the following:

{
  title : "this is my title" ,
  author : "bob" ,
  posted : new Date() ,
  pageViews : 5 ,
  tags : [ "fun" , "good" , "fun" ] ,
  comments : [
      { author :"joe" , text : "this is cool" } ,
      { author :"sam" , text : "this is bad" }
  ],
  other : { foo : 5 }
}

The current pipeline operators are:

$project

Reshapes a document stream by renaming, adding, or removing fields. Also use $project to create computed values or sub-documents. Use $project to:

  • Include fields from the original document.
  • Insert computed fields.
  • Rename fields.
  • Create and populate fields that hold sub-documents.

Use $project to quickly select the fields that you want to include or exclude from the response. Consider the following aggregation framework operation.

db.article.aggregate(
    { $project : {
        title : 1 ,
        author : 1 ,
    }}
 );

This operation includes the title field and the author field in the document that returns from the aggregation pipeline.

Note

The _id field is always included by default. You may explicitly exclude _id as follows:

db.article.aggregate(
    { $project : {
        _id : 0 ,
        title : 1 ,
        author : 1
    }}
);

Here, the projection excludes the _id field but includes the title and author fields.

Projections can also add computed fields to the document stream passing through the pipeline. A computed field can use any of the expression operators. Consider the following example:

db.article.aggregate(
    { $project : {
        title : 1,
        doctoredPageViews : { $add:["$pageViews", 10] }
    }}
);

Here, the field doctoredPageViews represents the value of the pageViews field after adding 10 to the original field using the $add.

Note

You must enclose the expression that defines the computed field in braces, so that the expression is a valid object.

You may also use $project to rename fields. Consider the following example:

db.article.aggregate(
    { $project : {
        title : 1 ,
        page_views : "$pageViews" ,
        bar : "$other.foo"
    }}
);

This operation renames the pageViews field to page_views, and renames the foo field in the other sub-document as the top-level field bar. The field references used for renaming fields are direct expressions and do not use an operator or surrounding braces. All aggregation field references can use dotted paths to refer to fields in nested documents.

Finally, you can use the $project to create and populate new sub-documents. Consider the following example that creates a new object-valued field named stats that holds a number of values:

db.article.aggregate(
    { $project : {
        title : 1 ,
        stats : {
            pv : "$pageViews",
            foo : "$other.foo",
            dpv : { $add:["$pageViews", 10] }
        }
    }}
);

This projection includes the title field and places $project into “inclusive” mode. Then, it creates the stats documents with the following fields:

  • pv which includes and renames the pageViews from the top level of the original documents.
  • foo which includes the value of other.foo from the original documents.
  • dpv which is a computed field that adds 10 to the value of the pageViews field in the original document using the $add aggregation expression.
$match

Provides a query-like interface to filter documents out of the aggregation pipeline. The $match drops documents that do not match the condition from the aggregation pipeline, and it passes documents that match along the pipeline unaltered.

The syntax passed to the $match is identical to the query syntax. Consider the following prototype form:

db.article.aggregate(
    { $match : <match-predicate> }
);

The following example performs a simple field equality test:

db.article.aggregate(
    { $match : { author : "dave" } }
);

This operation only returns documents where the author field holds the value dave. Consider the following example, which performs a range test:

db.article.aggregate(
    { $match : { score  : { $gt : 50, $lte : 90 } } }
);

Here, all documents return when the score field holds a value that is greater than 50 and less than or equal to 90.

Note

Place the $match as early in the aggregation pipeline as possible. Because $match limits the total number of documents in the aggregation pipeline, earlier $match operations minimize the amount of later processing. If you place a $match at the very beginning of a pipeline, the query can take advantage of indexes like any other db.collection.find() or db.collection.findOne().

Warning

You cannot use $where or geospatial operations in $match queries as part of the aggregation pipeline.

$limit

Restricts the number of documents that pass through the $limit in the pipeline.

$limit takes a single numeric (positive whole number) value as a parameter. Once the specified number of documents pass through the pipeline operator, no more will. Consider the following example:

db.article.aggregate(
    { $limit : 5 }
);

This operation returns only the first 5 documents passed to it from by the pipeline. $limit has no effect on the content of the documents it passes.

$skip

Skips over the specified number of documents that pass through the $skip in the pipeline before passing all of the remaining input.

$skip takes a single numeric (positive whole number) value as a parameter. Once the operation has skipped the specified number of documents, it passes all the remaining documents along the pipeline without alteration. Consider the following example:

db.article.aggregate(
    { $skip : 5 }
);

This operation skips the first 5 documents passed to it by the pipeline. $skip has no effect on the content of the documents it passes along the pipeline.

$unwind

Peels off the elements of an array individually, and returns a stream of documents. $unwind returns one document for every member of the unwound array within every source document. Take the following aggregation command:

db.article.aggregate(
    { $project : {
        author : 1 ,
        title : 1 ,
        tags : 1
    }},
    { $unwind : "$tags" }
);

Note

The dollar sign (i.e. $) must proceed the field specification handed to the $unwind operator.

In the above aggregation $project selects (inclusively) the author, title, and tags fields, as well as the _id field implicitly. Then the pipeline passes the results of the projection to the $unwind operator, which will unwind the tags field. This operation may return a sequence of documents that resemble the following for a collection that contains one document holding a tags field with an array of 3 items.

{
     "result" : [
             {
                     "_id" : ObjectId("4e6e4ef557b77501a49233f6"),
                     "title" : "this is my title",
                     "author" : "bob",
                     "tags" : "fun"
             },
             {
                     "_id" : ObjectId("4e6e4ef557b77501a49233f6"),
                     "title" : "this is my title",
                     "author" : "bob",
                     "tags" : "good"
             },
             {
                     "_id" : ObjectId("4e6e4ef557b77501a49233f6"),
                     "title" : "this is my title",
                     "author" : "bob",
                     "tags" : "fun"
             }
     ],
     "OK" : 1
}

A single document becomes 3 documents: each document is identical except for the value of the tags field. Each value of tags is one of the values in the original “tags” array.

Note

$unwind has the following behaviors:

  • $unwind is most useful in combination with $group.
  • You may undo the effects of unwind operation with the $group pipeline operator.
  • If you specify a target field for $unwind that does not exist in an input document, the pipeline ignores the input document, and will generate no result documents.
  • If you specify a target field for $unwind that is not an array, db.collection.aggregate() generates an error.
  • If you specify a target field for $unwind that holds an empty array ([]) in an input document, the pipeline ignores the input document, and will generates no result documents.
$group

Groups documents together for the purpose of calculating aggregate values based on a collection of documents. Practically, group often supports tasks such as average page views for each page in a website on a daily basis.

The output of $group depends on how you define groups. Begin by specifying an identifier (i.e. a _id field) for the group you’re creating with this pipeline. You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields. Aggregate keys may resemble the following document:

{ _id : { author: '$author', pageViews: '$pageViews', posted: '$posted' } }

With the exception of the _id field, $group cannot output nested documents.

Important

The output of $group is not ordered.

Every group expression must specify an _id field. You may specify the _id field as a dotted field path reference, a document with multiple fields enclosed in braces (i.e. { and }), or a constant value.

Note

Use $project as needed to rename the grouped field after an $group operation, if necessary.

Consider the following example:

db.article.aggregate(
    { $group : {
        _id : "$author",
        docsPerAuthor : { $sum : 1 },
        viewsPerAuthor : { $sum : "$pageViews" }
    }}
);

This groups by the author field and computes two fields, the first docsPerAuthor is a counter field that adds one for each document with a given author field using the $sum function. The viewsPerAuthor field is the sum of all of the pageViews fields in the documents for each group.

Each field defined for the $group must use one of the group aggregation function listed below to generate its composite value:

Warning

The aggregation system currently stores $group operations in memory, which may cause problems when processing a larger number of groups.

$sort

The $sort pipeline operator sorts all input documents and returns them to the pipeline in sorted order. Consider the following prototype form:

db.<collection-name>.aggregate(
    { $sort : { <sort-key> } }
);

This sorts the documents in the collection named <collection-name>, according to the key and specification in the { <sort-key> } document.

Specify the sort in a document with a field or fields that you want to sort by and a value of 1 or -1 to specify an ascending or descending sort respectively, as in the following example:

db.users.aggregate(
    { $sort : { age : -1, posts: 1 } }
);

This operation sorts the documents in the users collection, in descending order according by the age field and then in ascending order according to the value in the posts field.

When comparing values of different BSON types, MongoDB uses the following comparison order, from lowest to highest:

  1. MinKey (internal type)
  2. Null
  3. Numbers (ints, longs, doubles)
  4. Symbol, String
  5. Object
  6. Array
  7. BinData
  8. ObjectID
  9. Boolean
  10. Date, Timestamp
  11. Regular Expression
  12. MaxKey (internal type)

Note

MongoDB treats some types as equivalent for comparison purposes. For instance, numeric types undergo conversion before comparison.

Note

The $sort cannot begin sorting documents until previous operators in the pipeline have returned all output.

$sort operator can take advantage of an index when placed at the beginning of the pipeline or placed before the following aggregation operators:

Warning

Unless the $sort operator can use an index, in the current release, the sort must fit within memory. This may cause problems when sorting large numbers of documents.

Expressions

These operators calculate values within the aggregation framework.

$group Operators

The $group pipeline stage provides the following operations:

$addToSet

Returns an array of all the values found in the selected field among the documents in that group. Every unique value only appears once in the result set. There is no ordering guarantee for the output documents.

$first

Returns the first value it encounters for its group .

Note

Only use $first when the $group follows an $sort operation. Otherwise, the result of this operation is unpredictable.

$last

Returns the last value it encounters for its group.

Note

Only use $last when the $group follows an $sort operation. Otherwise, the result of this operation is unpredictable.

$max

Returns the highest value among all values of the field in all documents selected by this group.

$min

Returns the lowest value among all values of the field in all documents selected by this group.

$avg

Returns the average of all the values of the field in all documents selected by this group.

$push

Returns an array of all the values found in the selected field among the documents in that group. A value may appear more than once in the result set if more than one field in the grouped documents has that value.

$sum

Returns the sum of all the values for a specified field in the grouped documents, as in the second use above.

Alternately, if you specify a value as an argument, $sum will increment this field by the specified value for every document in the grouping. Typically, as in the first use above, specify a value of 1 in order to count members of the group.

Boolean Operators

The three boolean operators accept Booleans as arguments and return Booleans as results.

Note

These operators convert non-booleans to Boolean values according to the BSON standards. Here, null, undefined, and 0 values become false, while non-zero numeric values, and all other types, such as strings, dates, objects become true.

$and

Takes an array one or more values and returns true if all of the values in the array are true. Otherwise $and returns false.

Note

$and uses short-circuit logic: the operation stops evaluation after encountering the first false expression.

$or

Takes an array of one or more values and returns true if any of the values in the array are true. Otherwise $or returns false.

Note

$or uses short-circuit logic: the operation stops evaluation after encountering the first true expression.

$not

Returns the boolean opposite value passed to it. When passed a true value, $not returns false; when passed a false value, $not returns true.

Comparison Operators

These operators perform comparisons between two values and return a Boolean, in most cases, reflecting the result of that comparison.

All comparison operators take an array with a pair of values. You may compare numbers, strings, and dates. Except for $cmp, all comparison operators return a Boolean value. $cmp returns an integer.

$cmp

Takes two values in an array and returns an integer. The returned value is:

  • A negative number if the first value is less than the second.
  • A positive number if the first value is greater than the second.
  • 0 if the two values are equal.
$eq

Takes two values in an array and returns a boolean. The returned value is:

  • true when the values are equivalent.
  • false when the values are not equivalent.
$gt

Takes two values in an array and returns a boolean. The returned value is:

  • true when the first value is greater than the second value.
  • false when the first value is less than or equal to the second value.
$gte

Takes two values in an array and returns a boolean. The returned value is:

  • true when the first value is greater than or equal to the second value.
  • false when the first value is less than the second value.
$lt

Takes two values in an array and returns a boolean. The returned value is:

  • true when the first value is less than the second value.
  • false when the first value is greater than or equal to the second value.
$lte

Takes two values in an array and returns a boolean. The returned value is:

  • true when the first value is less than or equal to the second value.
  • false when the first value is greater than the second value.
$ne

Takes two values in an array returns a boolean. The returned value is:

  • true when the values are not equivalent.
  • false when the values are equivalent.

Arithmetic Operators

These operators only support numbers.

$add

Takes an array of one or more numbers and adds them together, returning the sum.

$divide

Takes an array that contains a pair of numbers and returns the value of the first number divided by the second number.

$mod

Takes an array that contains a pair of numbers and returns the remainder of the first number divided by the second number.

See also

$mod

$multiply

Takes an array of one or more numbers and multiples them, returning the resulting product.

$subtract

Takes an array that contains a pair of numbers and subtracts the second from the first, returning their difference.

String Operators

These operators manipulate strings within projection expressions.

$strcasecmp

Takes in two strings. Returns a number. $strcasecmp is positive if the first string is “greater than” the second and negative if the first string is “less than” the second. $strcasecmp returns 0 if the strings are identical.

Note

$strcasecmp may not make sense when applied to glyphs outside the Roman alphabet.

$strcasecmp internally capitalizes strings before comparing them to provide a case-insensitive comparison. Use $cmp for a case sensitive comparison.

$substr

$substr takes a string and two numbers. The first number represents the number of bytes in the string to skip, and the second number specifies the number of bytes to return from the string.

Note

$substr is not encoding aware and if used improperly may produce a result string containing an invalid UTF-8 character sequence.

$toLower

Takes a single string and converts that string to lowercase, returning the result. All uppercase letters become lowercase.

Note

$toLower may not make sense when applied to glyphs outside the Roman alphabet.

$toUpper

Takes a single string and converts that string to uppercase, returning the result. All lowercase letters become uppercase.

Note

$toUpper may not make sense when applied to glyphs outside the Roman alphabet.

Date Operators

All date operators take a “Date” typed value as a single argument and return a number.

$dayOfYear

Takes a date and returns the day of the year as a number between 1 and 366.

$dayOfMonth

Takes a date and returns the day of the month as a number between 1 and 31.

$dayOfWeek

Takes a date and returns the day of the week as a number between 1 (Sunday) and 7 (Saturday.)

$year

Takes a date and returns the full year.

$month

Takes a date and returns the month as a number between 1 and 12.

$week

Takes a date and returns the week of the year as a number between 0 and 53.

Weeks begin on Sundays, and week 1 begins with the first Sunday of the year. Days preceding the first Sunday of the year are in week 0. This behavior is the same as the “%U” operator to the strftime standard library function.

$hour

Takes a date and returns the hour between 0 and 23.

$minute

Takes a date and returns the minute between 0 and 59.

$second

Takes a date and returns the second between 0 and 59, but can be 60 to account for leap seconds.

Conditional Expressions

$cond

Use the $cond operator with the following syntax:

{ $cond: [ <boolean-expression>, <true-case>, <false-case> ] }

Takes an array with three expressions, where the first expression evaluates to a Boolean value. If the first expression evaluates to true, $cond returns the value of the second expression. If the first expression evaluates to false, $cond evaluates and returns the third expression.

$ifNull

Use the $ifNull operator with the following syntax:

{ $ifNull: [ <expression>, <replacement-if-null> ] }

Takes an array with two expressions. $ifNull returns the first expression if it evaluates to a non-null value. Otherwise, $ifNull returns the second expression’s value.