Joining Collections in MongoDB with .NET Core and an Aggregation Pipeline
Rate this tutorial
If you've been keeping up with my .NET Core series on MongoDB, you'll remember that we explored creating a simple console application as well as building a RESTful API with basic CRUD support. In both examples, we used basic filters when interacting with MongoDB from our applications.
But what if we need to do something a bit more complex, like join data from two different MongoDB collections?
In this tutorial, we're going to take a look at aggregation pipelines and some of the ways that you can work with them in a .NET Core application.
Before we get started, there are a few requirements that must be met to be successful:
- Have a MongoDB Atlas cluster deployed and configured.
- Install .NET Core 6+.
- Install the MongoDB sample data sets.
We will be using .NET Core 6.0 for this particular tutorial. Older or newer versions might work, but there's a chance that some of the commands may be a little different. The expectation is that you already have a MongoDB Atlas cluster ready to go. This could be a free M0 cluster or better, but you'll need it properly configured with user roles and network access rules. You'll also need the MongoDB sample data sets to be attached.
Because we're expecting to accomplish some fairly complicated things in this tutorial, it's probably a good idea to break down the data going into it and the data that we're expecting to come out of it.
In this tutorial, we're going to be using the sample_mflix database and the movies collection. We're also going to be using a custom playlist collection that we're going to add to the sample_mflix database.
To give you an idea of the data that we're going to be working with, take the following document from the movies collection:
Alright, so I didn't include the entire document because it is actually quite huge. Knowing every single field is not going to help or hurt the example as long as we're familiar with the
_id
field.Next, let's look at a document in the proposed playlist collection:
Knowing the fields in the above document is important as they'll be used throughout our aggregation pipelines.
One of the most important things to take note of between the two collections is the fact that the
_id
fields are ObjectId
and the values in the items
field are strings. More on this as we progress.Now that we know our input documents, let's take a look at what we're expecting as a result of our queries. If I were to query for a playlist, I don't want the id values for each of the movies. I want them fully expanded, like the following:
This is where the aggregation pipelines come in and some joining because we can't just do a normal filter on a
Find
operation, unless we wanted to perform multiple Find
operations.To keep things simple, we're going to be building a console application that uses our aggregation pipeline. You can take the logic and apply it towards a web application if that is what you're interested in.
From the CLI, execute the following:
The above commands will create a new .NET Core project and install the latest MongoDB driver for C#. Everything we do next will happen in the project's "Program.cs" file.
Open the "Program.cs" file and add the following C# code:
The above code will connect to a MongoDB cluster, get a reference to our playlist collection, and dump all the documents from that collection into the console. Finding and returning all the documents in the collection is not a requirement for the aggregation pipeline, but it might help with the learning process.
The
ATLAS_URI_HERE
string can be obtained from the MongoDB Atlas Dashboard after clicking "Connect" for a particular cluster.We're going to explore a few different options towards creating an aggregation pipeline query with .NET Core. The first will use raw
BsonDocument
type data.We know our input data and we know our expected outcome, so we need to come up with a few pipeline stages to bring it together.
Let's start with the first stage:
The first stage of this pipeline uses the
$match
operator to find only documents where the username
is "nraboy." This could be more than one because we're not treating username
as a unique field.With the filter in place, let's move to the next stage:
Remember how the document
_id
fields were ObjectId and the items
array were strings? For the join to be successful, they need to be of the same type. The second pipeline stage is more of a manipulation stage with the $project
operator. We're defining the fields we want passed to the next stage, but we're also modifying some of the fields, in particular the items
field. Using the $map
operator we can take the string values and convert them to ObjectId values.If your
items
array contained ObjectId instead of string values, this particular stage wouldn't be necessary. It might also not be necessary if you're using POCO classes instead of BsonDocument
types. That is a lesson for another day though.With our item values mapped correctly, we can push them to the next stage in the pipeline:
The above pipeline stage is where the JOIN operation actually happens. We're looking into the movies collection and we're using the ObjectId fields from our playlist collection to join them to the
_id
field of our movies collection. The output from this JOIN will be stored in a new movies
field.The
$lookup
is like saying the following:Of course there is more to it than the above SQL statement because
items
is an array, something you can't natively work with in most SQL databases.So as of right now, we have our joined data. However, its not quite as elegant as what we wanted in our final outcome. This is because the
$lookup
output is an array which will leave us with a multidimensional array. Remember, items
was an array and each movies
is an array. Not the most pleasant thing to work with, so we probably want to further manipulate the data in another stage.The above stage will take our new
movies
field and flatten it out with the $unwind
operator. The $unwind
operator basically takes each element of an array and creates a new result item to sit adjacent to the rest of the fields of the parent document. So if you have, for example, one document that has an array with two elements, after doing an $unwind
, you'll have two documents.Our end goal, though, is to end up with a single dimension array of movies, so we can fix this with another pipeline stage.
The above stage will group our documents and add our unwound movies to a new
movies
field, one that isn't multidimensional.So let's bring the pipeline stages together so they can be run in our application.
Executing the code thus far should give us our expected outcome in terms of data and format.
Now, you might be thinking that the above five-stage pipeline was a lot to handle for a JOIN operation. There are a few things that you should be aware of:
- Our id values were not of the same type, which resulted in another stage.
- Our values to join were in an array, not a one-to-one relationship.
What I'm trying to say is that the length and complexity of your pipeline is going to depend on how you've chosen to model your data.
Let's look at another way to accomplish our desired outcome. We can make use of the Fluent API that MongoDB offers instead of creating an array of pipeline stages.
Take a look at the following:
In the above example, we used methods such as
Match
, Project
, Lookup
, Unwind
, and Group
to get our final result. For some of these methods, we didn't need to use a BsonDocument
like we saw in the previous example.You just saw two ways to do a MongoDB aggregation pipeline for joining collections within a .NET Core application. Like previously mentioned, there are a few ways to accomplish what we want, all of which are going to be dependent on how you've chosen to model the data within your collections.
There is a third way, which we'll explore in another tutorial, and this uses LINQ to get the job done.
If you have questions about anything you saw in this tutorial, drop by the MongoDB Community Forums and get involved!