BlogRun AI wherever your compliance framework demands. Read blog >

BlogRetrieval accuracy is now a competitive advantage Read blog >

MongoDB Aggregation

When working with data in MongoDB, you may quickly have to run complex operations, with multiple stages of operations to gather metrics for your project. Generating reports and displaying useful metadata are just two major use cases where MongoDB aggregation operations can prove incredibly useful, powerful, and flexible.

Table of contents

What is aggregation?
Single-purpose aggregation
How do I use MongoDB to aggregate data?
The aggregation pipeline method
How fast is MongoDB aggregation?
Conclusion
FAQs

What is aggregation?

In programming, we often run a series of operations on a collection of items. Take the following JavaScript sample:

Code Snippet

In this example, we have two operations that are being run on the numbers array:

First, map(): we take the objects and convert them down to their numerical values.
Second, reduce(): We consolidate the output to a single number — the sum of the numbers.

Aggregation operations process data records and return computed results.

Not only do we have the ability to aggregate data on the client side with JavaScript, but we can use MongoDB to run operations on the server against our collections stored in the database before the result is returned to the client.

Single-purpose aggregation

MongoDB provides two methods to perform aggregation. The simplest is single-purpose aggregation.

Single-purpose aggregation operations are a collection of helper methods applied to a collection to calculate a result. These helper methods enable simple access to common aggregation processes.

Two of such methods provided are:

Let's use a collection named “sales” that stores purchases:

Code Snippet

If we wanted to determine what the different purchasing methods are, we could call distinct() in our Node.js script:

Code Snippet

distinctPurchaseMethods is an array that contains all of the unique purchase methods stored in the “sales” collection.

Code Snippet

If we wanted to see how many sales in total were made, we could run:

Code Snippet

countDocuments() will aggregate the total number of documents in the collection and return that number for us to use. If we have to aggregate a collection based on one of the above helper methods, then we can use single-purpose aggregation.

How do I use MongoDB to aggregate data?

When you need to do more complex aggregation, you can use the MongoDB aggregation pipeline (check out our more detailed tutorial). Aggregation pipelines are sequences of stages that can query, filter, alter, and process our documents. It's a Turing-complete implementation that can be used as a (rather inefficient) programming language.

Before we dive into the code, let's understand what the aggregation pipeline itself does and how it works. In the aggregation pipeline, you list out a series of instructions in a "stage." For each stage that's defined, MongoDB executes them one after another in order to give a finalized output you're able to use. Let's look at an example usage of the aggregate command:

Code Snippet

In this example, we run a stage called $match. Once that stage is run, it passes its output to the $group stage.

$match allows us to take a collection of items and only receive the items with the status values of A.

Afterward, we use $group in order to group documents based on the cust_id field. As part of the $group stage, we calculate the sum of all of each group's amount fields.

In addition to $sum, MongoDB provides a myriad of other operators you can use in your aggregations.

The aggregation pipeline method

Let's look at the same collection of sales we were using earlier, for example. Below is a document from this collection:

Code Snippet

Given that we have a list of items sold for each transaction, we can calculate the average cost of all purchased items using the aggregation pipeline.

We can start by using $set to add a field to each document. Combined with $sum, we're able to add a field called itemsTotal to each of the documents.

Code Snippet

Now the documents in the pipeline have been transformed to contain a new property named itemsTotal.

Code Snippet

Next, we can pass the documents from the $set stage to a $group stage. Inside of $group, we can use the "$avg" operator to calculate the average transaction price across all documents.

Code Snippet

Once this stage is completed, we'll be left with a single document that gives us the finalized output:

Code Snippet

The output tells us that the average price across all transactions is $620.511328.

The finalized code for this aggregation should look something like this in Node.js:

Code Snippet

findAndModify command

aggregate isn't the only function that gets to enjoy the benefits of the aggregation syntax. As of MongoDB 4.2, a variety of commands support using aggregation pipelines to update documents.

Let's take a look at just one command that does so: updateMany.

We might want to add itemsTotal as a permanent field to our documents in order to have faster reads on those operations.

Let's use updateMany with an aggregation pipeline to add a new field called itemsTotal.

Code Snippet

As you can tell, we've reused the $set stage from the previous example. Now, if we check our collection, we can see the new field in each document.

Code Snippet

How fast is MongoDB aggregation?

While our examples have been realistic and useful in the right context, they've also been relatively small. We've only used two stages in the aggregate pipeline.

This isn't the full potential of the aggregate pipeline, though—far from it.

The aggregation pipeline allows you to perform complex operations that will allow any range of insights into your collections. There are dozens of pipeline stages as well as a wide range of operations you can utilize to build most any analysis on your data you'd imagine.

While the aggregation pipeline is extremely powerful, how performant is it compared to doing these types of analytics on our own?

Let's use the example aggregation query from before:

Code Snippet

In our MongoDB example, we're using two stages: one to add an itemsTotal field, and the other to calculate the average of itemsTotal across all documents.

To match this behavior in Node.js, we'll use Array.prototype.map and Array.prototype.reduce as relevant stand-ins:

Code Snippet

Running each of the code snippets above against a collection of 5,000 documents yielded the following timing results:

Aggregation took 103.46ms.

Manual iteration through the cursor took 881.32ms.

That's a difference of over 8.5x! While the difference might be in milliseconds here, we're using an extremely small collection size. It's not difficult to imagine how drastic the timing differences would be if our collection held a million or more documents. Remember that an aggregation pipeline runs in the MongoDB server and can be optimized before running, while when you iterate over a cursor to process data client-side, you add a lot of latency due to fetching pages of data from that cursor. The best approach is probably a mix of both.

Conclusion

The aggregation pipeline has enabled us to do a lot with this example, from determining how many documents are in a collection and being able to run complex operations against that collection, to gathering an average across multiple data points and modifying the collection in the database.

While we've learned a lot about the aggregation pipeline today, it's just the beginning. The aggregation pipeline is incredibly powerful and contains many in-depth elements. If you're wanting to read more about the pipeline and its usage, you can read through our documentation for more.

If you need a book, you can always refer to Practical MongoDB Aggregations.

And get a free MongoDB Skill Badge credential on the aggregation topic, “Fundamentals of Data Transformation”, in 60 to 90 minutes.

FAQs

Aggregate data is high-level data formed through the combination of numerical or non-numerical data from multiple sources.

Data aggregation is the process of putting together a large group of data for high level examination.

Aggregators are organizations, websites, or software applications that collect information from different sources and consolidate it in one place.

Get started with Atlas today

Get started in seconds. Our free clusters come with 512 MB of storage so you can play around with sample data and get oriented with our platform.

Try FreeContact sales

GET STARTED WITH:

125+ regions worldwide
Sample data sets
Always-on authentication
End-to-end encryption

Command line tools

MongoDB Aggregation

What is aggregation?

Single-purpose aggregation

How do I use MongoDB to aggregate data?

The aggregation pipeline method

findAndModify command

How fast is MongoDB aggregation?

Conclusion

FAQs

What is aggregate data?

What is data aggregation?

What are aggregators

Related Content:

Get started with Atlas today