EventGet 50% off your ticket to MongoDB.local London on October 2. Use code WEB50Learn more >>
MongoDB Developer
Java
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Languageschevron-right
Javachevron-right

Java - Aggregation Pipeline

Maxime Beugnet8 min read • Published Feb 01, 2022 • Updated Mar 01, 2024
MongoDBAggregation FrameworkJava
Facebook Icontwitter iconlinkedin icon
Rate this quickstart
star-empty
star-empty
star-empty
star-empty
star-empty

Updates

The MongoDB Java quickstart repository is available on GitHub.

February 28th, 2024

  • Update to Java 21
  • Update Java Driver to 5.0.0
  • Update logback-classic to 1.2.13

November 14th, 2023

  • Update to Java 17
  • Update Java Driver to 4.11.1
  • Update mongodb-crypt to 1.8.0

March 25th, 2021

  • Update Java Driver to 4.2.2.
  • Added Client Side Field Level Encryption example.

October 21st, 2020

  • Update Java Driver to 4.1.1.
  • The Java Driver logging is now enabled via the popular SLF4J API so, I added logback in the pom.xml and a configuration file logback.xml.

What's the Aggregation Pipeline?

Java badge
The aggregation pipeline is a framework for data aggregation modeled on the concept of data processing pipelines, just like the "pipe" in the Linux Shell. Documents enter a multi-stage pipeline that transforms the documents into aggregated results.
It's the most powerful way to work with your data in MongoDB. It will allow us to make advanced queries like grouping documents, manipulate arrays, reshape document models, etc.
Let's see how we can harvest this power using Java.

Getting Set Up

I will use the same repository as usual in this series. If you don't have a copy of it yet, you can clone it or just update it if you already have it:
If you didn't set up your free cluster on MongoDB Atlas, now is great time to do so. You have all the instructions in this blog post.

First Example with Zips

In the MongoDB Sample Dataset in MongoDB Atlas, let's explore a bit the zips collection in the sample_training database.
As you can see, we have one document for each zip code in the USA and for each, we have the associated population.
To calculate the population of New York, I would have to sum the population of each zip code to get the population of the entire city.
Let's try to find the 3 biggest cities in the state of Texas. Let's design this on paper first.
  • I don't need to work with the entire collection. I need to filter only the cities in Texas.
  • Once this is done, I can regroup all the zip code from a same city together to get the total population.
  • Then I can order my cities by descending order or population.
  • Finally, I can keep the first 3 cities of my list.
The easiest way to build this pipeline in MongoDB is to use the aggregation pipeline builder that is available in MongoDB Compass or in MongoDB Atlas in the Collections tab.
Once this is done, you can export your pipeline to Java using the export button.
After a little code refactoring, here is what I have:
The MongoDB driver provides a lot of helpers to make the code easy to write and to read.
As you can see, I solved this problem with:
  • A $match stage to filter my documents and keep only the zip code in Texas,
  • A $group stage to regroup my zip codes in cities,
  • A $project stage to rename the field _id in city for a clean output (not mandatory but I'm classy),
  • A $sort stage to sort by population descending,
  • A $limit stage to keep only the 3 most populated cities.
Here is the output we get:
In MongoDB 4.2, there are 30 different aggregation pipeline stages that you can use to manipulate your documents. If you want to know more, I encourage you to follow this course on MongoDB University: M121: The MongoDB Aggregation Framework.

Second Example with Posts

This time, I'm using the collection posts in the same database.
This collection of 500 posts has been generated artificially, but it contains arrays and I want to show you how we can manipulate arrays in a pipeline.
Let's try to find the three most popular tags and for each tag, I also want the list of post titles they are tagging.
Here is my solution in Java.
Here I'm using the very useful $unwind stage to break down my array of tags.
It allows me in the following $group stage to group my tags, count the posts and collect the titles in a new array titles.
Here is the final output I get.
As you can see, some titles are repeated. As I said earlier, the collection was generated so the post titles are not uniq. I could solve this "problem" by using the $addToSet operator instead of the $push one if this was really an issue.

Final Code

Wrapping Up

The aggregation pipeline is very powerful. We have just scratched the surface with these two examples but trust me if I tell you that it's your best ally if you can master it.
I encourage you to follow the M121 course on MongoDB University to become an aggregation pipeline jedi.
If you want to learn more and deepen your knowledge faster, I recommend you check out the M220J: MongoDB for Java Developers training available for free on MongoDB University.
In the next blog post, I will explain to you the Change Streams in Java.

Facebook Icontwitter iconlinkedin icon
Rate this quickstart
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Quickstart

Java - Mapping POJOs


Mar 01, 2024 | 5 min read
Podcast

Scaling the Gaming Industry with Gaspard Petit of Square Enix


Mar 22, 2023 | 29 min
Article

How to Optimize Java Performance With Virtual Threads, Reactive Programming, and MongoDB


Aug 29, 2024 | 5 min read
Article

Why unstructured data is a good fit for Java


Aug 30, 2024 | 7 min read
Table of Contents