/ /

Aggregation

Overview

In this guide, you can learn how to perform aggregation operations in the Rust driver.

Aggregation operations process data in your MongoDB collections based on specifications you can set in an aggregation pipeline. An aggregation pipeline consists of one or more stages. Each stage performs an operation based on its expression operators. After the driver executes the aggregation pipeline, it returns an aggregated result.

This guide includes the following sections:

Compare Aggregation and Find Operations describes the functionality differences between aggregation and find operations
Server Limitations describes the server limitations on memory usage for aggregation operations
Examples provides examples of aggregations for different use cases
Additional Information provides links to resources and API documentation for types and methods mentioned in this guide

Tip

Complete Aggregation Tutorials

You can find tutorials that provide detailed explanations of common aggregation tasks in the Complete Aggregation Pipeline Tutorials section of the Server manual. Select a tutorial, and then pick Rust from the Select your language drop-down menu in the upper-right corner of the page.

Analogy

Aggregation operations function similarly to car factories with assembly lines. The assembly lines have stations with specialized tools to perform specific tasks. For example, when building a car, the assembly line begins with the frame. Then, as the car frame moves through the assembly line, each station assembles a separate part. The result is a transformed final product, the finished car.

The assembly line represents the aggregation pipeline, the individual stations represent the aggregation stages, the specialized tools represent the expression operators, and the finished product represents the aggregated result.

Compare Aggregation and Find Operations

The following table lists the different tasks you can perform with find operations, compared to what you can achieve with aggregation operations. The aggregation framework provides expanded functionality that allows you to transform and manipulate your data.

Find Operations	Aggregation Operations
Select certain documents to return Select which fields to return Sort the results Limit the results Count the results	Select certain documents to return Select which fields to return Sort the results Limit the results Count the results Rename fields Compute new fields Summarize data Connect and merge data sets

Server Limitations

When performing aggregation operations, consider the following limitations:

Returned documents must not violate the BSON document size limit of 16 megabytes.
Pipeline stages have a memory limit of 100 megabytes by default. If required, you can exceed this limit by setting the allow_disk_use field in your AggregateOptions.
The $graphLookup operator has a strict memory limit of 100 megabytes and ignores the allow_disk_use setting.

Examples

The examples in this section use the following sample documents. Each document represents a user profile on a book review website and contains information about their name, age, genre interests, and date that they were last active on the website:

{ "name": "Sonya Mehta", "age": 23, "genre_interests": ["fiction", "mystery", "memoir"], "last_active": { "$date": "2023-05-13T00:00:00.000Z" } },
{ "name": "Selena Sun", "age": 45, "genre_interests": ["fiction", "literary", "theory"], "last_active": { "$date": "2023-05-25T00:00:00.000Z" } },
{ "name": "Carter Johnson", "age": 56, "genre_interests": ["literary", "self help"], "last_active": { "$date": "2023-05-31T00:00:00.000Z" } },
{ "name": "Rick Cortes", "age": 18, "genre_interests": ["sci-fi", "fantasy", "memoir"], "last_active": { "$date": "2023-07-01T00:00:00.000Z" } },
{ "name": "Belinda James", "age": 76, "genre_interests": ["literary", "nonfiction"], "last_active": { "$date": "2023-06-11T00:00:00.000Z" } },
{ "name": "Corey Saltz", "age": 29, "genre_interests": ["fiction", "sports", "memoir"], "last_active": { "$date": "2023-01-23T00:00:00.000Z" } },
{ "name": "John Soo", "age": 16, "genre_interests": ["fiction", "sports"], "last_active": { "$date": "2023-01-03T00:00:00.000Z" } },
{ "name": "Lisa Ray", "age": 39, "genre_interests": ["poetry", "art", "memoir"], "last_active": { "$date": "2023-05-30T00:00:00.000Z" } },
{ "name": "Kiran Murray", "age": 20, "genre_interests": ["mystery", "fantasy", "memoir"], "last_active": { "$date": "2023-01-30T00:00:00.000Z" } },
{ "name": "Beth Carson", "age": 31, "genre_interests": ["mystery", "nonfiction"], "last_active": { "$date": "2023-08-04T00:00:00.000Z" } },
{ "name": "Thalia Dorn", "age": 21, "genre_interests": ["theory", "literary", "fiction"], "last_active": { "$date": "2023-08-19T00:00:00.000Z" } },
{ "name": "Arthur Ray", "age": 66, "genre_interests": ["sci-fi", "fantasy", "fiction"], "last_active": { "$date": "2023-11-27T00:00:00.000Z" } }

Age Insights by Genre

The following example calculates the average, minimum, and maximum age of users interested in each genre.

The aggregation pipeline contains the following stages:

An $unwind stage to separate each array entry in the genre_interests field into a new document.
A $group stage to group documents by the value of the genre_interests field. This stage finds the average, minimum, and maximum user age by using the $avg, $min, and $max operators.

let age_pipeline = vec![
    doc! { "$unwind": doc! { "path": "$genre_interests" } },
    doc! { "$group": doc! {
        "_id": "$genre_interests",
        "avg_age": doc! { "$avg": "$age" },
        "min_age": doc! { "$min": "$age" },
        "max_age": doc! { "$max": "$age" }
    } }
];
let mut results = my_coll.aggregate(age_pipeline).await?;
while let Some(result) = results.try_next().await? {
    println!("* {:?}", result);
}

* { "_id": "memoir", "avg_age": 25.8, "min_age": 18, "max_age": 39 }
* { "_id": "sci-fi", "avg_age": 42, "min_age": 18, "max_age": 66 }
* { "_id": "fiction", "avg_age": 33.333333333333336, "min_age": 16, "max_age": 66 }
* { "_id": "nonfiction", "avg_age": 53.5, "min_age": 31, "max_age": 76 }
* { "_id": "self help", "avg_age": 56, "min_age": 56, "max_age": 56 }
* { "_id": "poetry", "avg_age": 39, "min_age": 39, "max_age": 39 }
* { "_id": "literary", "avg_age": 49.5, "min_age": 21, "max_age": 76 }
* { "_id": "fantasy", "avg_age": 34.666666666666664, "min_age": 18, "max_age": 66 }
* { "_id": "mystery", "avg_age": 24.666666666666668, "min_age": 20, "max_age": 31 }
* { "_id": "theory", "avg_age": 33, "min_age": 21, "max_age": 45 }
* { "_id": "art", "avg_age": 39, "min_age": 39, "max_age": 39 }
* { "_id": "sports", "avg_age": 22.5, "min_age": 16, "max_age": 29 }

Group by Time Component

The following example finds how many users were last active in each month.

The aggregation pipeline contains the following stages:

$project stage to extract the month from the last_active field as a number into the month_last_active field
$group stage to group documents by the month_last_active field and count the number of documents for each month
$sort stage to set an ascending sort on the month

let last_active_pipeline = vec![
    doc! { "$project": { "month_last_active" : doc! { "$month" : "$last_active" } } },
    doc! { "$group": doc! { "_id" : doc! {"month_last_active": "$month_last_active"} ,
    "number" : doc! { "$sum" : 1 } } },
    doc! { "$sort": { "_id.month_last_active" : 1 } }
];
let mut results = my_coll.aggregate(last_active_pipeline).await?;
while let Some(result) = results.try_next().await? {
    println!("* {:?}", result);
}

* { "_id": { "month_last_active": 1 }, "number": 3 }
* { "_id": { "month_last_active": 5 }, "number": 4 }
* { "_id": { "month_last_active": 6 }, "number": 1 }
* { "_id": { "month_last_active": 7 }, "number": 1 }
* { "_id": { "month_last_active": 8 }, "number": 2 }
* { "_id": { "month_last_active": 11 }, "number": 1 }

Calculate Popular Genres

The following example finds the three most popular genres based on how often they appear in users' interests.

The aggregation pipeline contains the following stages:

$unwind stage to separate each array entry in the genre_interests field into a new document
$group stage to group documents by the genre_interests field and count the number of documents for each genre
$sort stage to set a descending sort on the genre popularity
$limit stage to show only the first three genres

let popularity_pipeline = vec![
    doc! { "$unwind" : "$genre_interests" },
    doc! { "$group" : doc! { "_id" : "$genre_interests" , "number" : doc! { "$sum" : 1 } } },
    doc! { "$sort" : doc! { "number" : -1 } },
    doc! { "$limit": 3 }
];
let mut results = my_coll.aggregate(popularity_pipeline).await?;
while let Some(result) = results.try_next().await? {
    println!("* {:?}", result);
}

* { "_id": "fiction", "number": 6 }
* { "_id": "memoir", "number": 5 }
* { "_id": "literary", "number": 4 }

Additional Information

To learn more about the concepts mentioned in this guide, see the following Server manual entries:

To learn more about the behavior of the aggregate() method, see the Aggregation Operations section of the Retrieve Data guide.

To learn more about sorting results within an aggregation pipeline, see the Sort guide.

MongoDB Vector Search

You can perform similarity searches on vector embeddings by using the MongoDB Vector Search feature. To learn more, see the MongoDB Vector Search guide.

API Documentation

To learn more about the methods and types mentioned in this guide, see the following API documentation:

Back

Schema Validation

Collations