- Aggregation >
- Aggregation Framework Examples
Aggregation Framework Examples¶
On this page
MongoDB provides flexible data aggregation functionality with the
aggregate
command. For additional information about
aggregation consider the following resources:
This document provides a number of practical examples that display the capabilities of the aggregation framework. All examples use a publicly available data set of all zipcodes and populations in the United States.
Aggregations using the Zip Code Data Set¶
To run you will need the zipcode data set. These data are available at:
media.mongodb.org/zips.json.
Use mongoimport
to load this data set into your
mongod
instance.
Data Model¶
Each document in this collection has the following form:
In these documents:
- The
_id
field holds the zipcode as a string. - The
city
field holds the city. - The
state
field holds the two letter state abbreviation. - The
pop
field holds the population. - The
loc
field holds the location as a latitude longitude pair.
All of the following examples use the aggregate()
helper in the mongo
shell. aggregate()
provides a
wrapper around the aggregate
database command. See the
documentation for your driver for a
more idiomatic interface for data aggregation operations.
States with Populations Over 10 Million¶
To return all states with a population greater than 10 million, use the following aggregation operation:
Aggregations operations using the aggregate()
helper, process all documents on the
zipcodes
collection. aggregate()
connects a number of pipeline operators, which define the aggregation
process.
In the above example, the pipeline passes all documents in the
zipcodes
collection through the following steps:
the
$group
operator collects all documents and creates documents for each state.These new per-state documents have one field in addition the
_id
field:totalPop
which is a generated field using the$sum
operation to calculate the total value of allpop
fields in the source documents.After the
$group
operation the documents in the pipeline resemble the following:the
$match
operation filters these documents so that the only documents that remain are those where the value oftotalPop
is greater than or equal to 10 million.The
$match
operation does not alter the documents, which have the same format as the documents output by$group
.
The equivalent SQL for this operation is:
Average City Population by State¶
To return the average populations for cities in each state, use the following aggregation operation:
Aggregations operations using the aggregate()
helper, process all documents on the
zipcodes
collection. aggregate()
a number of pipeline operators that define the aggregation
process.
In the above example, the pipeline passes all documents in the
zipcodes
collection through the following steps:
the
$group
operator collects all documents and creates new documents for every combination of thecity
andstate
fields in the source document.After this stage in the pipeline, the documents resemble the following:
the second
$group
operator collects documents by thestate
field and use the$avg
expression to compute a value for theavgCityPop
field.
The final output of this aggregation operation is:
Largest and Smallest Cities by State¶
To return the smallest and largest cities by population for each state, use the following aggregation operation:
Aggregations operations using the aggregate()
helper, process all documents on the
zipcodes
collection. aggregate()
a number of pipeline operators that define the aggregation
process.
All documents from the zipcodes
collection pass into the pipeline,
which consists of the following steps:
the
$group
operator collects all documents and creates new documents for every combination of thecity
andstate
fields in the source documents.By specifying the value of
_id
as a sub-document that contains both fields, the operation preserves thestate
field for use later in the pipeline. The documents produced by this stage of the pipeline have a second field,pop
, which uses the$sum
operator to provide the total of thepop
fields in the source document.At this stage in the pipeline, the documents resemble the following:
$sort
operator orders the documents in the pipeline based on the value of thepop
field from smallest to largest. This operation does not alter the documents.the second
$group
operator collects the documents in the pipeline by thestate
field, which is a field inside the nested_id
document.Within each per-state document this
$group
operator specifies four fields: Using the$last
expression, the$group
operator creates thebiggestcity
andbiggestpop
fields that store the city with the largest population and that population. Using the$first
expression, the$group
operator creates thesmallestcity
andsmallestpop
fields that store the city with the smallest population and that population.The documents, at this stage in the pipeline resemble the following:
The final operation is
$project
, which renames the_id
field tostate
and moves thebiggestCity
,biggestPop
,smallestCity
, andsmallestPop
intobiggestCity
andsmallestCity
sub-documents.
The final output of this aggregation operation is:
Aggregation with User Preference Data¶
Data Model¶
Consider a hypothetical sports club with a database that contains a
user
collection that tracks user’s join dates, sport preferences,
and stores these data in documents that resemble the following:
Normalize and Sort Documents¶
The following operation returns user names in upper case and in
alphabetical order. The aggregation includes user names for all documents in
the users
collection. You might do this to normalize user names for
processing.
All documents from the users
collection passes through the
pipeline, which consists of the following operations:
The results of the aggregation would resemble the following:
Return Usernames Ordered by Join Month¶
The following aggregation operation returns user names sorted by the month they joined. This kind of aggregation could help generate membership renewal notices.
The pipeline passes all documents in the users
collection through
the following operations:
- The
$project
operator:- Creates two new fields:
month_joined
andname
. - Suppresses the
id
from the results. Theaggregate()
method includes the_id
, unless explicitly suppressed.
- Creates two new fields:
- The
$month
operator converts the values of thejoined
field to integer representations of the month. Then the$project
operator assigns those values to themonth_joined
field. - The
$sort
operator sorts the results by themonth_joined
field.
The operation returns results that resemble the following:
Return Total Number of Joins per Month¶
The following operation shows how many people joined each month of the year. You might use this aggregated data for such information for recruiting and marketing strategies.
The pipeline passes all documents in the users
collection through
the following operations:
- The
$project
operator creates a new field calledmonth_joined
. - The
$month
operator converts the values of thejoined
field to integer representations of the month. Then the$project
operator assigns the values to themonth_joined
field. - The
$group
operator collects all documents with a givenmonth_joined
value and counts how many documents there are for that value. Specifically, for each unique value,$group
creates a new “per-month” document with two fields:_id
, which contains a nested document with themonth_joined
field and its value.number
, which is a generated field. The$sum
operator increments this field by 1 for every document containing the givenmonth_joined
value.
- The
$sort
operator sorts the documents created by$group
according to the contents of themonth_joined
field.
The result of this aggregation operation would resemble the following:
Return the Five Most Common “Likes”¶
The following aggregation collects top five most “liked” activities in the data set. In this data set, you might use an analysis of this to help inform planning and future development.
The pipeline begins with all documents in the users
collection,
and passes these documents through the following operations:
The
$unwind
operator separates each value in thelikes
array, and creates a new version of the source document for every element in the array.Example
Given the following document from the
users
collection:The
$unwind
operator would create the following documents:The
$group
operator collects all documents the same value for thelikes
field and counts each grouping. With this information,$group
creates a new document with two fields:_id
, which contains thelikes
value.number
, which is a generated field. The$sum
operator increments this field by 1 for every document containing the givenlikes
value.
The
$sort
operator sorts these documents by thenumber
field in reverse order.The
$limit
operator only includes the first 5 result documents.
The results of aggregation would resemble the following: