Docs Menu

Docs HomeDevelop ApplicationsMongoDB Manual

$percentile (aggregation)

On this page

  • Definition
  • Syntax
  • Command Fields
  • Behavior
  • Examples
  • Learn More
$percentile

New in version 7.0.

Returns an array of scalar values that correspond to specified percentile values.

You can use $percentile as an accumulator in the $group stage or as an aggegation expression.

The syntax for $percentile is:

{
$percentile: {
input: <expression>,
p: [ <expression1>, <expression2>, ... ],
method: <string>
}
}

$percentile takes the following fields:

Field
Type
Necessity
Description
input
Expression
Required
$percentile calculates the percentile values of this data. input must be a field name or an expression that evaluates to a numeric type. If the expression cannot be converted to a numeric type, the $percentile calculation ignores it.
p
Expression
Required

$percentile calculates a percentile value for each element in p. The elements represent percentages and must evaluate to numeric values in the range 0.0 to 1.0, inclusive.

$percentile returns results in the same order as the elements in p.

method
String
Required
The method that mongod uses to calculate the percentile value. The method must be 'approximate'.

You can use $percentile in:

  • $group stages as an accumulator

  • $setWindowFields stages as an accumulator

  • $project stages as an aggregation expression

$percentile has the following characteristics as an accumulator, it:

  • Calculates a single result for all the documents in the stage.

  • Uses the t-digest algorithm to calculate approximate, percentile based metrics.

  • Uses approximate methods to scale to large volumes of data.

$percentile has the following characteristics as an aggregation expression, it:

  • Accepts an array as input

  • Calculates a separate result for each input document

In a $group stage, $percentile is an accumulator and calculates a value for all documents in the window.

In a $project stage, $percentile is an aggregation expression and calculates values for each document.

In $setWindowFields stages, $percentile returns a result for each document like an aggregation expression, but the results are computed over groups of documents like an accumulator.

In $group stages, $percentile always uses an approximate calculation method.

In $project stages, $percentile might use the discrete calculation method even when the approximate method is specified.

In $setWindowFields stages, the workload determines the calculation method that $percentile uses.

The computed percentiles $percentile returns might vary, even on the same datasets. This is because the algorithm calculates approximate values.

Duplicate samples can cause ambiguity. If there are a large number of duplicates, the percentile values may not represent the actual sample distribution. Consider a data set where all the samples are the same. All of the values in the data set fall at or below any percentile. A "50th percentile" value would actually represent either 0 or 100 percent of the samples.

$percentile returns the minimum value for p = 0.0.

$percentile returns the maximum value for p = 1.0.

If you use $percentile as an aggregation expression in a $project stage, you can use an array as input. The syntax is:

{
$percentile: {
input: [ <expression1, <expression2>, .., <expressionN> ],
p: [ <expression1>, <expression2>, ... ],
method: <string>
}
}

A window function lets you calculate results over a moving "window" of neighboring documents. As each document passes though the pipeline, the $setWindowFields stage:

  • Recomputes the set of documents in the current window

  • calculates a value for all documents in the set

  • returns a single value for that document

You can use $percentile in a $setWindowFields stage to calculate rolling statistics for time series or other related data.

When you use $percentile in a $setWindowField stage, the input value must be a field name. If you enter an array instead of a field name, the operation fails.

The following examples use the testScores collection. Create the collection:

db.testScores.insertMany( [
{ studentId: "2345", test01: 62, test02: 81, test03: 80 },
{ studentId: "2356", test01: 60, test02: 83, test03: 79 },
{ studentId: "2358", test01: 67, test02: 82, test03: 78 },
{ studentId: "2367", test01: 64, test02: 72, test03: 77 },
{ studentId: "2369", test01: 60, test02: 53, test03: 72 }
] )

Create an accumulator that calculates a single percentile value:

db.testScores.aggregate( [
{
$group: {
_id: null,
test01_percentiles: {
$percentile: {
input: "$test01",
p: [ 0.95 ],
method: 'approximate'
}
},
}
}
] )

Output:

{ _id: null, test01_percentiles: [ 67 ] }

The _id field value is null so $group selects all the documents in the collection.

The percentile accumulator takes its input data from the test01 field.

In this example, the percentiles array, p, has one value so the $percentile operator only calculates one term for the test01 data. The 95th percentile value is 67.

Create an accumulator that calculates multiple percentile values:

db.testScores.aggregate( [
{
$group: {
_id: null,
test01_percentiles: {
$percentile: {
input: "$test01",
p: [ 0.5, 0.75, 0.9, 0.95 ],
method: 'approximate'
}
},
test02_percentiles: {
$percentile: {
input: "$test02",
p: [ 0.5, 0.75, 0.9, 0.95 ],
method: 'approximate'
}
},
test03_percentiles: {
$percentile: {
input: "$test03",
p: [ 0.5, 0.75, 0.9, 0.95 ],
method: 'approximate'
}
},
test03_percent_alt: {
$percentile: {
input: "$test03",
p: [ 0.9, 0.5, 0.75, 0.95 ],
method: 'approximate'
}
},
}
}
] )

Output:

{
_id: null,
test01_percentiles: [ 62, 64, 67, 67 ],
test02_percentiles: [ 81, 82, 83, 83 ],
test03_percentiles: [ 78, 79, 80, 80 ],
test03_percent_alt: [ 80, 78, 79, 80 ]
}

The _id field value is null so $group selects all the documents in the collection.

The percentile accumulator calculates values for three fields, test01, test02, and test03.

The accumulator calculates the 50th, 75th, 90th, and 95th percentile values for each input field.

The percentile values are returned in the same order as the elements of p. The values in test03_percentiles and test03_percent_alt are the same, but their order is different. The order of elements in each result array matches the corresponding order of elements in p.

In a $project stage, $percentile is an aggregation expression and calculates values for each document.

You can use a field name or an array as input in a $project stage.

db.testScores.aggregate( [
{
$project: {
_id: 0,
studentId: 1,
testPercentiles: {
$percentile: {
input: [ "$test01", "$test02", "$test03" ],
p: [ 0.5, 0.95 ],
method: 'approximate'
}
}
}
}
] )

Output:

{ studentId: '2345', testPercentiles: [ 80, 81 ] },
{ studentId: '2356', testPercentiles: [ 79, 83 ] },
{ studentId: '2358', testPercentiles: [ 78, 82 ] },
{ studentId: '2367', testPercentiles: [ 72, 77 ] },
{ studentId: '2369', testPercentiles: [ 60, 72 ] }

When $percentile is an aggregation expression there is a result for each studentId.

To base your percentile values on local data trends, use $percentile in a $setWindowField aggregation pipeline stage.

This example creates a window to filter scores:

db.testScores.aggregate( [
{
$setWindowFields: {
sortBy: { test01: 1 },
output: {
test01_95percentile: {
$percentile: {
input: "$test01",
p: [ 0.95 ],
method: 'approximate'
},
window: {
range: [ -3, 3 ]
}
}
}
}
},
{
$project: {
_id: 0,
studentId: 1,
test01_95percentile: 1
}
}
] )

Output:

{ studentId: '2356', test01_95percentile: [ 62 ] },
{ studentId: '2369', test01_95percentile: [ 62 ] },
{ studentId: '2345', test01_95percentile: [ 64 ] },
{ studentId: '2367', test01_95percentile: [ 67 ] },
{ studentId: '2358', test01_95percentile: [ 67 ] }

In this example, the percentile calculation for each document also incorporates data from the three documents before and after it.

The $median operator is a special case of the $percentile operator that uses a fixed value of p: [ 0.5 ].

For more information on window functions, see: $setWindowFields.

← $or (aggregation)