Quick Start: Golang & MongoDB - Data Aggregation Pipeline

Nicolas Raboy
February 20, 2020 | Updated: May 20, 2021

If you've been following along with my getting started series around MongoDB and Golang, you might remember the tutorial where we took a look at finding documents in a collection. In this tutorial, we saw how to use the Find and FindOne functions to filter for documents within a collection. This is essentially querying for documents within a specific collection where the filter parameters are fields within the schema of that collection.

So what happens if you need to do something a little more complex like return data that isn't within the schema, do complex manipulations prior to returning a response, or looking across collections in a single command?

This is where the MongoDB aggregation framework becomes valuable.

In this tutorial, we're going to look at a few MongoDB aggregation framework examples using the Go programming language, examples that can't really be done with a basic Find or FindOne operation.

The Requirements

To be successful with this tutorial, you'll need the following requirements to be met:

Go 1.10+ installed and configured
MongoDB Atlas with an M0 cluster or better
The MongoDB Go driver

It is advisable that you've completed the other tutorials in the getting started with MongoDB and Go series as it shares information around the schema that we're using as well as information on connecting to a cluster. However, if you feel comfortable with MongoDB and Go, it isn't an absolute requirement.

Use promotional code NICRABOY200 to receive $200 in premium credit towards your MongoDB Atlas cluster if you'd like something more powerful than the forever free M0 cluster.

Make sure that the MongoDB Atlas cluster has been properly whitelisted so that your Go application can communicate to it. For information on installing the MongoDB Go driver and connecting to a cluster, check out my previous tutorial on the subject.

Leveraging the MongoDB Aggregation Framework in Golang

We're going to be working with a simple application for this example. To get us up to speed, within the $GOPATH, create a new project with a main.go file and within that main.go file, include the following:

package main

import (
	"context"
	"fmt"
	"os"
	"time"

	"go.mongodb.org/mongo-driver/bson"
	"go.mongodb.org/mongo-driver/bson/primitive"
	"go.mongodb.org/mongo-driver/mongo"
	"go.mongodb.org/mongo-driver/mongo/options"
)

// Podcast represents the schema for the "Podcasts" collection
type Podcast struct {
	ID     primitive.ObjectID `bson:"_id,omitempty"`
	Title  string             `bson:"title,omitempty"`
	Author string             `bson:"author,omitempty"`
	Tags   []string           `bson:"tags,omitempty"`
}

// Episode represents the schema for the "Episodes" collection
type Episode struct {
	ID          primitive.ObjectID `bson:"_id,omitempty"`
	Podcast     primitive.ObjectID `bson:"podcast,omitempty"`
	Title       string             `bson:"title,omitempty"`
	Description string             `bson:"description,omitempty"`
	Duration    int32              `bson:"duration,omitempty"`
}

func main() {
	ctx, _ := context.WithTimeout(context.Background(), 10*time.Second)
	client, err := mongo.Connect(ctx, options.Client().ApplyURI(os.Getenv("ATLAS_URI")))
	if err != nil {
		panic(err)
	}
	defer client.Disconnect(ctx)

	database := client.Database("quickstart")
	episodesCollection := database.Collection("episodes")
}

You'll recall that we created the Go data structures with BSON annotations in a previous tutorial titled, Modeling MongoDB Documents with Native Go Data Structures. The logic used for connecting to a cluster and setting a handle to a particular database and collection was last seen in How to Get Connected to Your MongoDB Cluster with Go.

If you haven't already installed the Go driver for MongoDB, it can be installed by executing the following:

dep init
dep ensure -add "go.mongodb.org/mongo-driver/mongo"

If you don't have the Go dependency manager (dep) installed and configured, you can learn more about it here.

The important thing to take note of in our boilerplate code are the native Go data structures that represent the document schema for each of the collections. The goal here is to use the aggregation framework to work with the data in those collections, but add certain groupings, manipulations, ect..

Let's assume that you have the following documents in your episodes collection:

// Document #1
{
    "_id": ObjectId("5e3b381c1c9d4400004117e7"),
    "podcast": ObjectId("5e3b37e51c9d4400004117e6"),
    "title": "Episode #1",
    "description": "The first episode",
    "duration": 25
}

// Document #2
{
    "_id": ObjectId("5e3b38511c9d4400004117e8"),
    "podcast": ObjectId("5e3b37e51c9d4400004117e6"),
    "title": "Episode #2",
    "description": "The second episode",
    "duration": 30
}

The first aggregation that we're going to look at will take all the episodes for a particular podcast and get the total duration of that podcast. To be clear, I don't mean the duration of a particular episode, but the duration of the podcast as a whole.

For this aggregation query, we're going to focus on the podcast field as well as the duration field of our documents. Take the following code:

id, _ := primitive.ObjectIDFromHex("5e3b37e51c9d4400004117e6")

matchStage := bson.D{{"$match", bson.D{{"podcast", id}}}}
groupStage := bson.D{{"$group", bson.D{{"_id", "$podcast"}, {"total", bson.D{{"$sum", "$duration"}}}}}}

showInfoCursor, err := episodesCollection.Aggregate(ctx, mongo.Pipeline{matchStage, groupStage})
if err != nil {
    panic(err)
}
var showsWithInfo []bson.M
if err = showInfoCursor.All(ctx, &showsWithInfo); err != nil {
    panic(err)
}
fmt.Println(showsWithInfo)

Because the particular podcast is important to us, we are taking the id of the podcast in question and converting it into an object id that MongoDB and the Go driver can understand. Next we are defining stages of the aggregation pipeline, in this case a matching stage and grouping stage. In the matching stage we are matching all documents that have the podcast field in question. In the grouping stage we are grouping the matches by the podcast field because it is non-distinct, and then we are summing each of the duration fields into a new total field. The Aggregate operation executes our defined pipeline.

The result would look something like this:

[map[_id:ObjectID("5e3b37e51c9d4400004117e6") total:55]]

Had we altered the aggregation to include more podcast values, we could have ended up with several different podcast groups and their total minutes.

Let's look at another example. For this scenario, let's say we want to "join" documents from different collections similar to how you would in a relational database. Based on our document schema, we already have a podcast field in the episodes collection referencing a document in the podcasts collection. That document might look like this:

{
    "_id": ObjectId("5e3b37e51c9d4400004117e6"),
    "name": "The Polyglot Developer Podcast",
    "author": "Nic Raboy",
    "tags": ["development", "programming", "coding"]
}

So what would our aggregation query look like if we wanted to include the podcast information with the episode information? We might try to do something like this:

lookupStage := bson.D{{"$lookup", bson.D{{"from", "podcasts"}, {"localField", "podcast"}, {"foreignField", "_id"}, {"as", "podcast"}}}}
unwindStage := bson.D{{"$unwind", bson.D{{"path", "$podcast"}, {"preserveNullAndEmptyArrays", false}}}}

showLoadedCursor, err := episodesCollection.Aggregate(ctx, mongo.Pipeline{lookupStage, unwindStage})
if err != nil {
    panic(err)
}
var showsLoaded []bson.M
if err = showLoadedCursor.All(ctx, &showsLoaded); err != nil {
    panic(err)
}
fmt.Println(showsLoaded)

In the above example, we are using the $lookup operator to join from the podcasts collection using the podcast field in our episodes collection and the _id field in our foreign podcasts collection. The output from the $lookup operation will be an array stored as podcast.

After the $lookup operation, we make use of the $unwind operator to flatten the array that we had previously created. Think of flattening or deconstructing an array as taking an array and now outputting each element of that array as a document in the result set.

If we were to run our aggregation, we might end up with results that look like this:

[map[_id:ObjectID("5e3b381c1c9d4400004117e7") description:The first episode duration:25 podcast:map[_id:ObjectID("5e3b37e51c9d4400004117e6") author:Nic Raboy name:The Polygl
ot Developer Podcast tags:[development coding programming]] title:Episode #1] map[_id:ObjectID("5e3b38511c9d4400004117e8") description:The second episode duration:30 podcast
:map[_id:ObjectID("5e3b37e51c9d4400004117e6") author:Nic Raboy name:The Polyglot Developer Podcast tags:[development coding programming]] title:Episode #2]]

Notice that each podcast episode in the above results is now printed with the show information. This saves you from having to execute multiple Find operations within your Go code.

The above query that we saw is great, but I think we can do better.

In the first example that used $lookup and $unwind we were using an []bson.M to work with the results. Not the end of the world, but if we wanted to access particular fields, use an autocomplete, etc., things might get a little messy. Instead, we can create a native Go data structure to represent the results of our aggregation.

// PodcastEpisode represents an aggregation result-set for two collections
type PodcastEpisode struct {
	ID          primitive.ObjectID `bson:"_id,omitempty"`
	Podcast     Podcast            `bson:"podcast,omitempty"`
	Title       string             `bson:"title,omitempty"`
	Description string             `bson:"description,omitempty"`
	Duration    int32              `bson:"duration,omitempty"`
}

For the most part the above data structure will look familiar. However, pay attention to the Podcast field. In this example it is no longer a primitive.ObjectID, but instead a Podcast type, which is a previously defined data structure that we had created.

With this new data structure available, we can change our aggregation a bit:

lookupStage := bson.D{{"$lookup", bson.D{{"from", "podcasts"}, {"localField", "podcast"}, {"foreignField", "_id"}, {"as", "podcast"}}}}
unwindStage := bson.D{{"$unwind", bson.D{{"path", "$podcast"}, {"preserveNullAndEmptyArrays", false}}}}

showLoadedStructCursor, err := episodesCollection.Aggregate(ctx, mongo.Pipeline{lookupStage, unwindStage})
if err != nil {
    panic(err)
}
var showsLoadedStruct []PodcastEpisode
if err = showLoadedStructCursor.All(ctx, &showsLoadedStruct); err != nil {
    panic(err)
}
fmt.Println(showsLoadedStruct)

Notice that we're now using a []PodcastEpisode to store the results rather than a []bson.M. While we don't demonstrate it in this example, we would have access to each field within that data structure if we wanted to.

Conclusion

You just saw a few aggregation examples within MongoDB using the Go programming language (Golang). There are quite a few operators within the aggregation framework that MongoDB offers and you can learn more about them in the official documentation. While the examples that I demonstrated were short and with few operators, you could end up in more advanced territory depending on your needs.

To catch up on other tutorials in the getting started with Golang series, check out these:

To bring the series to a close, we'll be looking at change streams and transactions using the Go programming language and MongoDB.

← Previous

The MongoDB Community: Our First Steps Together Into A New Future

Announcing our new MongoDB Community forums. Join us to stay up-to-date on all things MongoDB.

February 19, 2020

Next →

That’s a Wrap: MongoDB’s 2025 in Review & 2026 Predictions

It’s nearly the end of the year—again! That means it’s time for an end-of-year blog post that expresses disbelief at the passage of time. Which, as the saying goes, flies when you’re having fun. And definitely when you’re as busy as MongoDB was in 2025. It was a big year for the company—and more importantly, for the tens of thousands of customers and millions of developers who rely on MongoDB’s modern data platform for their most mission-critical workloads. At MongoDB, everything we do starts with our obsession with customers and their needs, and if there’s a theme to MongoDB’s 2025, it was (and will continue to be) enabling customer innovation and helping them succeed in the AI era. So here are a few highlights of how MongoDB acted on behalf of customers in 2025. From the acquisition of Voyage AI to customer success across industries, a lot happened in 2025. Let’s go!* *Read to the end for 2026 thoughts. 2025: The (MongoDB) year that was Voyage AI, modernization, and search In February, MongoDB announced the acquisition of Voyage AI, a pioneer in embedding and reranking models, to enhance the accuracy of AI applications. Integrating Voyage AI's advanced retrieval technology with MongoDB’s modern, AI-ready data platform addresses a critical challenge: LLM model hallucinations caused by a lack of context. By improving retrieval accuracy for specialized domains like finance and law, the integration enables businesses to deploy AI for mission-critical use cases. To learn more, see the MongoDB Voyage AI page. Then, in September, we launched MongoDB AMP, an AI-powered Application Modernization Platform. AMP is designed to accelerate the transformation of legacy applications through a combination of AI-powered tooling, a proven delivery framework, and expert guidance (tools, techniques, and talent) to help enterprises reduce technical debt and modernize 2-3 times faster. Want more? Sure you do! Check out this short video. MongoDB also announced the addition of search and vector search capabilities to MongoDB Community Edition and MongoDB Enterprise Server. This allows developers to build and test AI-native applications, including those using retrieval-augmented generation (RAG), in local or on-premises environments. Previously exclusive to MongoDB Atlas, these features enable secure, hybrid deployments where sensitive data can remain on-premises while still leveraging advanced search tools. Here’s a (slightly less short) video about search and vector search on Enterprise Server. Growing and scaling with MongoDB As noted, everything we do at MongoDB starts with our obsession with customers. 2025 was another banner year for customer success and innovation—we were inspired by what organizations of every shape and size, across industries and geographies, built with MongoDB in 2025. Here are just two of the many stories our customers shared in 2025; much more can be found in my colleague Katie Palmer’s blog series, Innovating with MongoDB. Factory By combining the Atlas modern data platform with Voyage AI’s high-performance embeddings, the AI-native startup Factory—which uses AI agents called Droids to accelerate software development lifecycles for organizations—consolidated its fragmented tech stack. This enabled superior code retrieval, simplified operations, and provided the scalability needed to process billions of tokens daily. McKesson McKesson, a global pharmaceutical distributor, replaced its monolithic legacy infrastructure with MongoDB Atlas to meet strict drug tracing mandates. By adopting our modern cloud data platform, McKesson scaled its operations 300x, managing tracking data for 1.2 billion containers annually without latency, and ensuring compliance and patient safety while reducing developer complexity. For more, check out the video of McKesson at MongoDB.local NYC from September. From niche NoSQL to enterprise powerhouse As senior MongoDB engineer and Technical Fellow Ashish Kumar put it earlier this year, “through a sustained and deliberate engineering effort,” MongoDB has gone from a (seemingly) niche NoSQL solution to a trusted enterprise standard, and now delivers “the high availability, tunable consistency, ACID transactions, and robust security that enterprises demand.” A new era of leadership The face of MongoDB has also changed—our CFO, Mike Berry, joined the company in April, and Dev Ittycheria stepped down as CEO in November, after more than 11 years leading the company (including its 2017 IPO). In a LinkedIn post about his role, new MongoDB CEO CJ Desai noted that the company is “at the forefront of a new data revolution, unlocking the next wave of productivity and intelligence.” “Having spent my career building and scaling technology platforms, I’ve always been drawn to companies defined by clarity of vision, relentless organic innovation, and a customer-first culture. MongoDB exemplifies all three,” said Desai. We couldn’t agree more. Onward! Reading the 2026 tea leaves So what might 2026 bring (for MongoDB and tech at large)? Here are a handful of our leaders’ predictions: “As much as people want to talk about Artificial General Intelligence (AGI), we’re still in the phase where most AI use cases automate redundant tasks but benefit from human-in-the-loop checks. Organizations that use AI to complete work that historically is a drain on human resources—but then uses people to carefully verify what AI builds, apply governance frameworks, and maintain accountability across the data lifecycle—will be more successful.” —Pete Johnson, Field CTO, AI, MongoDB “After years of inflated expectations and unsustainable spending, the AI industry is trapped in a bubble where companies reflexively attempt to deploy LLMs at every problem, driving up costs with minimal to no return. Businesses that break free from this spending cycle are the ones that understand the need to ground LLM responses in factual data and learn from prior mistakes. We believe the best way to do this will be with highly accurate embedding models and rerankers for reliable data retrieval.” —Frank Liu, Staff Product Manager, MongoDB "In 2026, cloud independence will evolve from strategic preference to existential imperative across enterprises of every scale. The outages and disruptions of recent years have exposed a fundamental truth: in an always-on digital economy—where commerce, mobility, governance, and even public safety depend on uninterrupted access to cloud services—single-provider reliance is no longer a calculated risk, but a systemic vulnerability. Compounding this is the inexorable rise of data sovereignty. Regulatory regimes worldwide now demand precise jurisdictional control over data residency, rendering rigid cloud commitments incompatible with compliance at global scale. The defining competitive advantage will belong to organizations that transcend fragile prevention theater and engineer true infrastructural resilience: architectures inherently portable, data frictionlessly mobile, and operations autonomously sustained across heterogeneous clouds through AI-orchestrated redundancy. In short, the winners will not merely mitigate downtime—they will design systems that render the concept obsolete." —Ben Cefalo, SVP, Head of Core Products, MongoDB Happy holidays and happy New Year, everyone!

December 22, 2025