Definition
$fillNew in version 5.3.
Populates
nulland missing field values within documents.You can use
$fillto populate missing data points:In a sequence based on surrounding values.
With a fixed value.
Syntax
The $fill stage has this syntax:
{ $fill: { partitionBy: <expression>, partitionByFields: [ <field 1>, <field 2>, ... , <field n> ], sortBy: { <sort field 1>: <sort order>, <sort field 2>: <sort order>, ..., <sort field n>: <sort order> }, output: { <field 1>: { value: <expression> }, <field 2>: { method: <string> }, ... } } }
The $fill stage takes a document with these fields:
Field | Necessity | Description |
|---|---|---|
Optional | Specifies an expression to group
the documents. In the If you omit partitionBy and
partitionByFields,
partitionBy and partitionByFields are mutually exclusive. See an example. | |
Optional | Specifies an array of fields as the compound key to group the
documents. In the If you omit partitionBy and
partitionByFields,
partitionBy and partitionByFields are mutually exclusive. | |
Required if method is specified in at least one output.<field>. Otherwise, optional. | Specifies the field or fields to sort the documents within each
partition. Uses the same syntax as the | |
Required | Specifies an object containing each field for which to fill missing values. You can specify multiple fields in the output object. The object name is the name of the field to fill. The object value specifies how the field is filled. | |
Required | Specifies an object indicating how to fill missing values in the target field. The object name must be either
|
Behavior and Restrictions
partitionByFields Restrictions
$fill returns an error if any field name in the
partitionByFields array:
Evaluates to a non-string value.
Begins with
$.
linear Behavior
The linear fill method fills null and missing fields using
linear interpolation based on the surrounding non-null values in
the sequence.
For each document where the field is
nullor missing,linearFillfills those fields in proportion to the missing value range between surrounding non-nullvalues according to the sortBy order. To determine the values for missing fields,linearFilluses:The difference of surrounding non-
nullvalues.The number of
nullfields to fill between the surrounding values.
The
linearmethod can fill multiple consecutivenullvalues if those values are preceded and followed by non-nullvalues according to the sortBy order.Example
If a collection contains these documents:
{ index: 0, value: 0 }, { index: 1, value: null }, { index: 2, value: null }, { index: 3, value: null }, { index: 4, value: 10 } After using the
linearfill method to fill thenullvalues, the documents become:{ index: 0, value: 0 }, { index: 1, value: 2.5 }, { index: 2, value: 5 }, { index: 3, value: 7.5 }, { index: 4, value: 10 } nullvalues that are not preceded and followed by non-nullvalues remainnull.To use the
linearfill method, you must also use the sortBy field to sort your data.
For a complete example using the linear fill method, see
Fill Missing Field Values with Linear Interpolation.
locf Behavior
locf stands for last observation carried forward.
If a field being filled contains both
nulland non-null values,locfsets thenulland missing values to the field's last known non-null value according to the sortBy order.If the field contains only
nullor missing values in a partition,locfsets the field value tonullfor that partition.nulland missing field values that appear before non-null values in the sort order remainnull.
To use the
locffill method, you must also use the sortBy field to sort your data.
For a complete example using the locf fill method, see
Fill Missing Field Values Based on the Last Observed Value.
Comparison of $fill and Aggregation Operators
To fill null and missing field values within a document you can use:
The
$fillstage.When you use the
$fillstage, the field you specify in the output is the same field used as the source data.The
$linearFilland$locfaggregation operators.When you
$linearFillor$locf, you can set values for a different field than the field used as the source data.
Examples
The examples in this section show how to use $fill to
fill missing values:
Fill Missing Field Values with a Constant Value
A shoe store maintains a dailySales collection that contains a
document summarizing each day's sales. The shoe store sells these types
of shoes:
bootssandalssneakers
Create the following dailySales collection:
db.dailySales.insertMany( [ { "date": ISODate("2022-02-02"), "bootsSold": 10, "sandalsSold": 20, "sneakersSold": 12 }, { "date": ISODate("2022-02-03"), "bootsSold": 7, "sneakersSold": 18 }, { "date": ISODate("2022-02-04"), "sneakersSold": 5 } ] )
Not all of the documents in the dailySales collection contain each
shoe type. If a shoe type is missing, it means there were no shoes of
that type sold on the corresponding date.
The following example uses $fill to set the quantities sold
to 0 for the missing shoe types for each day's sales:
db.dailySales.aggregate( [ { $fill: { output: { "bootsSold": { value: 0 }, "sandalsSold": { value: 0 }, "sneakersSold": { value: 0 } } } } ] )
In the preceding pipeline:
$fillfills in values for missing fields.output specifies:
The names of the fields to fill in.
- The value to set the filled in fields to. In this example, the
- output specifies a constant value of
0.
Example output:
[ { _id: ObjectId("6202df9f394d47411658b51e"), date: ISODate("2022-02-02T00:00:00.000Z"), bootsSold: 10, sandalsSold: 20, sneakersSold: 12 }, { _id: ObjectId("6202df9f394d47411658b51f"), date: ISODate("2022-02-03T00:00:00.000Z"), bootsSold: 7, sneakersSold: 18, sandalsSold: 0 }, { _id: ObjectId("6202df9f394d47411658b520"), date: ISODate("2022-02-04T00:00:00.000Z"), sneakersSold: 5, bootsSold: 0, sandalsSold: 0 } ]
Fill Missing Field Values with Linear Interpolation
Create a stock collection that contains the following documents, which
track a single company's stock price at hourly intervals:
db.stock.insertMany( [ { time: ISODate("2021-03-08T09:00:00.000Z"), price: 500 }, { time: ISODate("2021-03-08T10:00:00.000Z"), }, { time: ISODate("2021-03-08T11:00:00.000Z"), price: 515 }, { time: ISODate("2021-03-08T12:00:00.000Z") }, { time: ISODate("2021-03-08T13:00:00.000Z") }, { time: ISODate("2021-03-08T14:00:00.000Z"), price: 485 } ] )
The price field is missing for some of the documents in the
collection.
To populate the missing price values by using linear interpolation,
use $fill with the linear fill method:
db.stock.aggregate( [ { $fill: { sortBy: { time: 1 }, output: { "price": { method: "linear" } } } } ] )
In the preceding pipeline:
$fillfills in values for missing fields.sortBy: { time: 1 }sorts the documents by thetimefield in ascending order, from earliest to latest.output specifies:
priceas the field for which to fill in missing values.{ method: "linear" }as the fill method. Thelinearfill- method fills missing
pricevalues by using linear interpolation based on the surroundingpricevalues in the sequence.
Example output:
[ { _id: ObjectId("620ad41c394d47411658b5e9"), time: ISODate("2021-03-08T09:00:00.000Z"), price: 500 }, { _id: ObjectId("620ad41c394d47411658b5ea"), time: ISODate("2021-03-08T10:00:00.000Z"), price: 507.5 }, { _id: ObjectId("620ad41c394d47411658b5eb"), time: ISODate("2021-03-08T11:00:00.000Z"), price: 515 }, { _id: ObjectId("620ad41c394d47411658b5ec"), time: ISODate("2021-03-08T12:00:00.000Z"), price: 505 }, { _id: ObjectId("620ad41c394d47411658b5ed"), time: ISODate("2021-03-08T13:00:00.000Z"), price: 495 }, { _id: ObjectId("620ad41c394d47411658b5ee"), time: ISODate("2021-03-08T14:00:00.000Z"), price: 485 } ]
Fill Missing Field Values Based on the Last Observed Value
Create a restaurantReviews collection that contains the following
documents, which store review scores for a single restaurant over time:
db.restaurantReviews.insertMany( [ { date: ISODate("2021-03-08"), score: 90 }, { date: ISODate("2021-03-09"), score: 92 }, { date: ISODate("2021-03-10") }, { date: ISODate("2021-03-11") }, { date: ISODate("2021-03-12"), score: 85 }, { date: ISODate("2021-03-13") } ] )
The score field is missing for some of the documents in the
collection.
To populate the missing score fields and ensure that there are no
gaps in the data, use $fill. In the following example,
$fill uses the locf fill method to fill the missing
score values with the previous score in the sequence:
db.restaurantReviews.aggregate( [ { $fill: { sortBy: { date: 1 }, output: { "score": { method: "locf" } } } } ] )
In the preceding pipeline:
$fillfills in missingscorevalues.sortBy: { date: 1 }sorts the documents by thedatefield in ascending order, from earliest to latest.output specifies:
scoreas the field for which to fill in missing values.{ method: "locf" }as the fill method. Thelocffill- method fills missing
scorevalues with the last observedscorein the sequence.
Example output:
[ { _id: ObjectId("62040bc9394d47411658b553"), date: ISODate("2021-03-08T00:00:00.000Z"), score: 90 }, { _id: ObjectId("62040bc9394d47411658b554"), date: ISODate("2021-03-09T00:00:00.000Z"), score: 92 }, { _id: ObjectId("62040bc9394d47411658b555"), date: ISODate("2021-03-10T00:00:00.000Z"), score: 92 }, { _id: ObjectId("62040bc9394d47411658b556"), date: ISODate("2021-03-11T00:00:00.000Z"), score: 92 }, { _id: ObjectId("62040bc9394d47411658b557"), date: ISODate("2021-03-12T00:00:00.000Z"), score: 85 }, { _id: ObjectId("62040bc9394d47411658b558"), date: ISODate("2021-03-13T00:00:00.000Z"), score: 85 } ]
Fill Data for Distinct Partitions
Consider the previous example with restaurant reviews but instead of tracking a single restaurant, the collection now contains reviews for multiple restaurants.
Create a collection named restaurantReviewsMultiple and populate the
collection with these documents:
db.restaurantReviewsMultiple.insertMany( [ { date: ISODate("2021-03-08"), restaurant: "Joe's Pizza", score: 90 }, { date: ISODate("2021-03-08"), restaurant: "Sally's Deli", score: 75 }, { date: ISODate("2021-03-09"), restaurant: "Joe's Pizza", score: 92 }, { date: ISODate("2021-03-09"), restaurant: "Sally's Deli" }, { date: ISODate("2021-03-10"), restaurant: "Joe's Pizza" }, { date: ISODate("2021-03-10"), restaurant: "Sally's Deli", score: 68 }, { date: ISODate("2021-03-11"), restaurant: "Joe's Pizza", score: 93 }, { date: ISODate("2021-03-11"), restaurant: "Sally's Deli" } ] )
The score field is missing for some of the documents in the
collection.
To populate the missing score fields and ensure that there are no
gaps in the data, use $fill. In the following example,
$fill uses the locf fill method to fill the missing
score values with the previous score in the sequence:
db.restaurantReviewsMultiple.aggregate( [ { $fill: { sortBy: { date: 1 }, partitionBy: { "restaurant": "$restaurant" }, output: { "score": { method: "locf" } } } } ] )
In the preceding pipeline:
$fillfills in missingscorevalues.sortBy: { date: 1 }sorts the documents by thedatefield in ascending order, from earliest to latest.partitionBy: { "restaurant": "$restaurant" }partitions the data byrestaurant. There are two restaurants:Joe's PizzaandSally's Deli.output specifies:
scoreas the field for which to fill in missing values.{ method: "locf" }as the fill method. Thelocffill- method fills missing
scorevalues with the last observedscorein the sequence.
Example output:
[ { _id: ObjectId("620559f4394d47411658b58f"), date: ISODate("2021-03-08T00:00:00.000Z"), restaurant: "Joe's Pizza", score: 90 }, { _id: ObjectId("620559f4394d47411658b591"), date: ISODate("2021-03-09T00:00:00.000Z"), restaurant: "Joe's Pizza", score: 92 }, { _id: ObjectId("620559f4394d47411658b593"), date: ISODate("2021-03-10T00:00:00.000Z"), restaurant: "Joe's Pizza", score: 92 }, { _id: ObjectId("620559f4394d47411658b595"), date: ISODate("2021-03-11T00:00:00.000Z"), restaurant: "Joe's Pizza", score: 93 }, { _id: ObjectId("620559f4394d47411658b590"), date: ISODate("2021-03-08T00:00:00.000Z"), restaurant: "Sally's Deli", score: 75 }, { _id: ObjectId("620559f4394d47411658b592"), date: ISODate("2021-03-09T00:00:00.000Z"), restaurant: "Sally's Deli", score: 75 }, { _id: ObjectId("620559f4394d47411658b594"), date: ISODate("2021-03-10T00:00:00.000Z"), restaurant: "Sally's Deli", score: 68 }, { _id: ObjectId("620559f4394d47411658b596"), date: ISODate("2021-03-11T00:00:00.000Z"), restaurant: "Sally's Deli", score: 68 } ]
Indicate if a Field was Populated Using $fill
When you populate missing values, the output does not indicate if a
value was populated with the $fill operator or if the value existed
in the document originally. To distinguish between filled and
preexisting values, you can use a $set stage before
$fill and set a new field based on whether the value exists.
For example, create a restaurantReviews collection that contains the
following documents, which store review scores for a restaurant over time:
db.restaurantReviews.insertMany( [ { date: ISODate("2021-03-08"), score: 90 }, { date: ISODate("2021-03-09"), score: 92 }, { date: ISODate("2021-03-10") }, { date: ISODate("2021-03-11") }, { date: ISODate("2021-03-12"), score: 85 }, { date: ISODate("2021-03-13") } ] )
The score field is missing for some of the documents in the
collection. You can populate missing score values by using the
$fill operator.
Create a pipeline to perform the following actions:
Use
$setto add a new field to each document indicating if the document'sscorefield exists prior to the$filloperator populating values. This new field is calledvalueExisted.Populate missing
scorevalues with the last observedscorein the sequence. The fill methodlocfstands for "last observation carried forward".
The pipeline resembles the following code:
db.restaurantReviews.aggregate( [ { $set: { "valueExisted": { "$ifNull": [ { "$toBool": { "$toString": "$score" } }, false ] } } }, { $fill: { sortBy: { date: 1 }, output: { "score": { method: "locf" } } } } ] )
Note
Handling Values of Zero
Output:
[ { _id: ObjectId("63595116b1fac2ee2e957f15"), date: ISODate("2021-03-08T00:00:00.000Z"), score: 90, valueExisted: true }, { _id: ObjectId("63595116b1fac2ee2e957f16"), date: ISODate("2021-03-09T00:00:00.000Z"), score: 92, valueExisted: true }, { _id: ObjectId("63595116b1fac2ee2e957f17"), date: ISODate("2021-03-10T00:00:00.000Z"), valueExisted: false, score: 92 }, { _id: ObjectId("63595116b1fac2ee2e957f18"), date: ISODate("2021-03-11T00:00:00.000Z"), valueExisted: false, score: 92 }, { _id: ObjectId("63595116b1fac2ee2e957f19"), date: ISODate("2021-03-12T00:00:00.000Z"), score: 85, valueExisted: true }, { _id: ObjectId("63595116b1fac2ee2e957f1a"), date: ISODate("2021-03-13T00:00:00.000Z"), valueExisted: false, score: 85 } ]
The Node.js examples on this page use the sample_weatherdata.data collection
from the Atlas sample datasets. To learn how to create a
free MongoDB Atlas cluster and load the sample datasets, see Get
Started in the MongoDB Node.js driver documentation.
To use the MongoDB Node.js driver to add a $fill stage to an
aggregation pipeline, use the $fill operator in a pipeline object.
The following example creates a pipeline that fills null or missing values. The pipeline includes the following stages:
The
$groupstage groups input documents by theirtsfield and computes each group's averageseaSurfaceTemperature.value.The
$fillstage sorts the grouped data by the_idfield in ascending order and fills null or missingseaSurfaceTemperaturevalues by using linear interpolation.
const pipeline = [ { $group: { _id: "$ts", seaSurfaceTemperature: { $avg: "$seaSurfaceTemperature.value" }, } }, { $fill: { sortBy: { _id: 1 }, output: { seaSurfaceTemperature: { method: "linear" } } } } ]; const cursor = collection.aggregate(pipeline); return cursor;