How to Use Custom Aggregation Expressions in MongoDB 4.4
Rate this tutorial
The upcoming release of MongoDB 4.4 makes it easier than ever to work with, transform, access, and make sense of your data. This release, the beta of which you
The MongoDB Query Language has many operators, or functions, that allow you to manipulate and transform your data to fit your application's use case. Operators such as
make it easy for developers to query, manipulate, and transform their dataset directly at the database level versus having to write additional code and transforming the data elsewhere. While there are operators for almost anything you can think of, there are a few edge cases where a provided operator or series of operators won't be sufficient, and that's where custom aggregation expressions come in.
For this tutorial you'll need:
Before we get into the code, I want to briefly talk about why you would care about this feature in the first place. The first reason is delivering higher performance to your users. If you can get the exact data you need directly out of the database in one trip, without having to do additional processing and manipulating, you will be able to serve and fulfill requests quicker. Second, custom aggregation expressions allow you to take care of edge cases directly in your aggregation pipeline stage. If you've worked with the aggregation pipeline in the past, you'll feel right at home and be productive in no time. If you're new to the aggregation pipeline, you'll only have to learn it once. By the time you find yourself with a use case for the
$accumulatoroperators, all of your previous knowledge will transfer over. I think those are two solid reasons to care about custom aggregation expressions: better performance for your users and increased developer productivity.
The one caveat to the liberal use of the
$accumulatorif an existing operator cannot fulfill your application's needs.
The first operator we'll take a look at is called
$functionoperator has three properties. The
argsarray containing the arguments we want to pass into our function, and a
langproperty specifying the language of our
nnumber of arguments, and the function returns a result. The arguments within the
bodyproperty will be mapped to the arguments provided in the
argsarray property, so you'll need to make sure you pass in and capture all of the provided arguments.
Now that we know the properties of the
$functionoperator, let's use it in an aggregation pipeline. To get started, let's choose a data set to work from. We'll use one of the provided MongoDB
that you can find on
. If you don't already have a cluster set up, you can do so by creating a
. Loading the sample datasets is as simple as clicking the "..." button on your cluster and selecting the "Load Sample Dataset" option.
Once you have the sample dataset loaded, let's go ahead and connect to our MongoDB cluster. Whenever learning something new, I prefer to use a visual approach, so for this tutorial, I'll rely on
. If you already have MongoDB Compass installed, connect to your cluster that has the sample dataset loaded, otherwise
, and then connect.
Whether you are using MongoDB Compass or connecting via the
, you can find your MongoDB Atlas connection string by clicking the "Connect" button on your cluster, choosing the type of app you'll be using to connect with, and copying the string which will look like this:
Once you are connected, the dataset that we will work with is called
sample_mflixand the collection
movies. Go ahead and connect to that collection and then navigate to the "Aggregations" tab. To ensure that everything works fine, let's write a very simple aggregation pipeline using the new
$functionoperator. From the dropdown, select the
$addFieldsoperator and add the following code as its implementation:
If you are using the mongo shell to execute these queries the code will look like this:
If you look at the output in MongoDB Compass and scroll to the bottom of each returned document, you'll see that each document now has a field called
fromFunctionwith the text
helloas its value. We could have simply passed the string "hello" instead of using the
$functionoperator, but the reason I wanted to do this was to ensure that your version of MongoDB Compass supports the
$functionoperator and this is a minimal way to test it.
Next, let's implement a custom function that actually does some work. Let's add a new field to every movie that has Ado's review score, or perhaps your own?
I'll name my field
adoScore. Now, my rating system is unique. Depending on the day and my mood, I may like a movie more or less, so we'll start figuring out Ado's score of a movie by randomly assigning it a value between 0 and 5. So we'll have a base that looks like this:
let base = Math.floor(Math.random() * 6);.
Next, if critics like the movie, then I do too, so let's say that if a movie has an IMDB score of over 8, we'll give it +1 to Ado's score. Otherwise, we'll leave it as is. For this, we'll pass in the
imdb.ratingfield into our function.
Finally, movies that have won awards also get a boost in Ado's scoring system. So for every award nomination a movie receives, the total Ado score will increase by 0.25, and for every award won, the score will increase by 0.5. To calculate this, we'll have to provide the
awardsfield into our function as well.
Since nothing is perfect, we'll add a custom rule to our function: if the total score exceeds 10, we'll just output the final score to be 9.9. Let's see what this entire function looks like:
And again, if you are using the mongo shell, the code will look like:
Running the above
$addFieldsaggregation , which uses the
$functionoperator, will produce a result that adds a new
adoScorefield to the end of each document. This field will contain a numeric value ranging from 0 to 9.9. In the
bodyproperty. As we iterated through our documents, the
$awardsfields from each document were passed into our custom function.
Using dot notation, we've seen how to specify any sub-document you may want to use in an aggregation. We also learned how to use an entire field and it's subfields in an aggregation, as we've seen with the
$awardsparameter in our earlier example. Our final result looks like this:
$accumulatoroperator as they do to
$function. We'll start by taking a look at the syntax for the
We have a couple of additional fields to discuss. Rather than just one
initfield that initializes the state of the accumulator.
accumulatefield that accumulates documents coming through the pipeline.
mergefield that is used to merge multiple states.
finalizefield that is used to update the result of the accumulation.
For arguments, we have two places to provide them: the
initArgsthat get passed into our
initfunction, and the
accumulateArgsthat get passed into our
accumulatefunction. The process for defining and passing the arguments is the same here as it is for the
$functionoperator. It's important to note that for the
accumulatefunction the first argument is the
staterather than the first item in the
Finally, we have to specify the
langfield. As before, it will be
jsas that's the only supported language as of the MongoDB 4.4 release.
To see a concrete example of the
$accumulatoroperator in action, we'll continue to use our
sample_mflixdataset. We'll also build on top of the
adoScorewe added with the
$functionoperator. We'll pair our
$groupoperator and return the number of movies released each year from our dataset, as well as how many movies are deemed watchable by Ado's scoring system (meaning they have a score greater than 8). Our
$accumulatorfunction will look like this:
If you are running the above aggregation using the mongo shell, the query will look like this:
The result of running this query on the
sample_mflixdatabase will look like this:
Note: Since the
adoScorefunction does rely on
Math.random()for part of its calculation, you may get varying results each time you run the aggregation.
Just like the
$functionoperator, writing a custom accumulator and using the
$accumulatoroperator should only be done when existing operators cannot fulfill your application's use case. Similarly, we are also just scratching the surface of what is achievable by writing your own accumulator. Check out the
Before we close out this blog post, let's take a look at what our completed aggregation pipeline will look like combining both our
$accumulatoroperators. If you are using the
sample_mflixdataset, you should be able to run both examples with the following aggregation pipeline code:
$accumulatoroperators released in MongoDB 4.4 improve developer productivity and allow MongoDB to handle many more edge cases out of the box. Just remember that these new operators, while powerful, should only be used if existing operators cannot get the job done as they may degrade performance!
Whether you are trying to use new functionality with these operators, fine-tuning your MongoDB cluster to get better performance, or are just trying to get more done with less, MongoDB 4.4 is sure to provide a few new and useful things for you. You can try all of these features out today by deploying a MongoDB 4.4 beta cluster on
Safe Harbor Statement
The development, release, and timing of any features or functionality described for MongoDB products remains at MongoDB's sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality. Except as required by law, we undertake no obligation to update any forward-looking statements to reflect events or circumstances after the date of such statements.