MongoDB
MongoDB Developer Center
chevron-right
Developer Topics
chevron-right
Products
chevron-right
MongoDB
chevron-right

Introduction to the MongoDB Aggregation Framework

Ken W. AlgerPublished Feb 01, 2022 • Updated May 09, 2022
MongoDBAggregation Framework
facebook icontwitter iconlinkedin icon
random alt
Rate this quickstart
star-empty
star-empty
star-empty
star-empty
star-empty
BSON Quickstart badge
One of the difficulties when storing any data is knowing how it will be accessed in the future. What reports need to be run on it? What information is "hidden" in there that will allow for meaningful insights for your business? After spending the time to
design your data schema
in an appropriate fashion for your application, one needs to be able to retrieve it. In MongoDB, there are two basic ways that data retrieval can be done: through queries with the
find()
command, and through analytics using the aggregation framework and the
aggregate()
command.
find() allows for the querying of data based on a condition. One can filter results, do basic document transformations, sort the documents, limit the document result set, etc. The aggregate() command opens the door to a whole new world with the
aggregation framework
. In this series of posts, I'll take a look at some of the reasons why using the aggregation framework is so powerful, and how to harness that power.

Why Aggregate with MongoDB?

A frequently asked question is why do aggregation inside MongoDB at all? From the MongoDB documentation:
Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.
By using the built-in aggregation operators available in MongoDB, we are able to do analytics on a cluster of servers we're already using without having to move the data to another platform, like Apache
Spark
or
Hadoop
. While those, and similar, platforms are fast, the data transfer from MongoDB to them can be slow and potentially expensive. By using the aggregation framework the work is done inside MongoDB and then the final results can be sent to the application typically resulting in a smaller amount of data being moved around. It also allows for the querying of the LIVE version of the data and not an older copy of data from a batch.
Aggregation in MongoDB allows for the transforming of data and results in a more powerful fashion than from using the find() command. Through the use of multiple stages and expressions, you are able to build a "pipeline" of operations on your data to perform analytic operations. What do I mean by a "pipeline"? The aggregation framework is conceptually similar to the *nix command line pipe, |. In the *nix command line pipeline, a pipe transfers the standard output to some other destination. The output of one command is sent to another command for further processing.
*nix pipeline example
In the aggregation framework, we think of stages instead of commands. And the stage "output" is documents. Documents go into a stage, some work is done, and documents come out. From there they can move onto another stage or provide output.

Aggregation Stages

At the time of this writing, there are twenty-eight different aggregation stages available. These different stages provide the ability to do a wide variety of tasks. For example, we can build an aggregation pipeline that matches a set of documents based on a set of criteria, groups those documents together, sorts them, then returns that result set to us.
Aggreggation Pipeline example
Or perhaps our pipeline is more complicated and the document flows through the $match, $unwind, $group, $sort, $limit, $project, and finally a $skip stage.
This can be confusing and some of these concepts are worth repeating. Therefore, let's break this down a bit further:
  • A pipeline starts with documents
  • These documents come from a collection, a view, or a specially designed stage
  • In each stage, documents enter, work is done, and documents exit
  • The stages themselves are defined using the document syntax
Let's take a look at an example pipeline. Our documents are from the
Sample Data
that's available in MongoDB Atlas and the routes collection in the sample_training database. Here's a sample document:
If you haven't yet set up your free cluster on
MongoDB Atlas
, now is a great time to do so. You have all the instructions in this
blog post
.
For this example query, let's find the top three airlines that offer the most direct flights out of the airport in Portland, Oregon, USA (PDX). To start with, we'll do a $match stage so that we can concentrate on doing work only on those documents that meet a base of conditions. In this case, we'll look for documents with a src_airport, or source airport, of PDX and that are direct flights, i.e. that have zero stops.
That reduces the number of documents in our pipeline down from 66,985 to 113. Next, we'll group by the airline name and count the number of flights:
With the addition of the $group stage, we're down to 16 documents. Let's sort those with a $sort stage and sort in descending order:
Then we can add a $limit stage to just have the top three airlines that are servicing Portland, Oregon:
After putting the documents in the sample_training.routes collection through this aggregation pipeline, our results show us that the top three airlines offering non-stop flights departing from PDX are Alaska, American, and United Airlines with 39, 17, and 13 flights, respectively.
How does this look in code? It's fairly straightforward with using the db.aggregate() function. For example, in Python you would do something like:
The aggregation code is pretty similar in other languages as well.

Wrap Up

The MongoDB aggregation framework is an extremely powerful set of tools. The processing is done on the server itself which results in less data being sent over the network. In the example used here, instead of pulling all of the documents into an application and processing them in the application, the aggregation framework allows for only the three documents we wanted from our query to be sent back to the application.
This was just a brief introduction to some of the operators available. Over the course of this series, I'll take a closer look at some of the most popular aggregation framework operators as well as some interesting, but less used ones. I'll also take a look at performance considerations of using the aggregation framework.

Copy Link
facebook icontwitter iconlinkedin icon
Rate this quickstart
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Podcast
Schema Suggestions with Julia Oppenheim - Podcast Episode 59

May 20, 2022
Tutorial
Document Enrichment and Schema Updates

May 12, 2022
Quickstart
Quick Start: BSON Data Types - Date

May 12, 2022
Tutorial
How to use MongoDB Client-Side Field Level Encryption (CSFLE) with Node.js

May 24, 2022
Table of Contents
  • Why Aggregate with MongoDB?