Aggregation Pipeline Facet Memory Issue

We have a database with many thousands of documents and we are using an aggregation pipeline with a facet stage.
The facet stage throws an error: “document constructed by $facet is 61104 bytes, which exceeds the limit of 57600 bytes”

We know we can set ‘allowDiskUse: true’, which we’ve already done.
We know we can set the ‘internalQueryFacetMaxOutputDocSizeBytes’ higher to accommodate this, which we’ve already done a couple of times.

What we would like to know:

Is there is a way to modify a facet stage or stream the data in such a way that as our data footprint grows, it wont continually keep consuming larger amounts of resources?

Hi @Dan_Alverth - Welcome to the community :slight_smile:

Currently, as you may be aware of, the $facet stage does not use any indexes and will perform a COLLSCAN. To perhaps provide any suggestions, would you be able to provide the following details:

  1. The $facet operation being executed
  2. The db.collection.explain(“executionStats”) output of the aggregation executed
  3. Sample input documents
  4. MongoDB version

Please also note the following from SERVER-40317:

Note that the $facet's output document is allowed to be up to 100MB large if it is an intermediate result. However, all documents produced by an aggregation pipeline’s result set must be 16MB or less due to the BSON size limit. So the 100MB limit applies to the output document produced by $facet, but the pipeline will still fail unless the size of this document is subsequently reduced to 16MB or less.

Interestingly the error message you have provided does contain values which appear relatively small (57600 bytes for example). In terms of the error message, in the past, $facet may consume an unlimited amount of memory. This was fixed in SERVER-40317, so it is possible that your pipeline is consuming an excess amount of resources to execute.

Regards,
Jason

Hello Jason,

Here is the facet stage we are trying to process:

/**
 * outputFieldN: The first output field.
 * stageN: The first aggregation stage.
 */
{
    "measures": [
      {
        "$unwind": {
          "path": "$timeSeriesCustomFieldValues.27405",
          "preserveNullAndEmptyArrays": true
        }
      },
      
      {
        "$unwind": {
          "path": "$timeSeriesCustomFieldValues.27407",
          "preserveNullAndEmptyArrays": true
        }
      },
      
      {
        "$group": {
          "_id": {
            "display-by_id": {
              "$filter": {
                "input": "$timeSeriesCustomFieldValues",
                "cond": { "$eq": [ "$$this.definitionId", "27405" ] }
              }
            },
            "display-by_display": {
              "$filter": {
                "input": "$timeSeriesCustomFieldValues",
                "cond": { "$eq": [ "$$this.definitionId", "27405" ] }
              }
            },
            "display-by_sort": {
              "$filter": {
                "input": "$timeSeriesCustomFieldValues",
                "cond": { "$eq": [ "$$this.definitionId", "27405" ] }
              }
            },
            "group-by_id": {
              "$filter": {
                "input": "$timeSeriesCustomFieldValues",
                "cond": { "$eq": [ "$$this.definitionId", "27407" ] }
              }
            },
            "group-by_display": {
              "$filter": {
                "input": "$timeSeriesCustomFieldValues",
                "cond": { "$eq": [ "$$this.definitionId", "27407" ] }
              }
            },
            "group-by_sort": {
              "$filter": {
                "input": "$timeSeriesCustomFieldValues",
                "cond": { "$eq": [ "$$this.definitionId", "27407" ] }
              }
            }
          },
          "result": {
            "$sum": 1
          }
        }
      },
      
      {
        "$sort": {
          "_id.display-by_sort.value.value": 1
        }
      },
      
      {
        "$sort": {
          "_id.group-by_sort.value.value": 1
        }
      },
      
      {
        "$group": {
          "_id": null,
          "display-by_labels": {
            "$push": {
              "$ifNull": [
                "$_id.display-by_display.value", null
              ]
            }
          },
          "display-by_ids": {
            "$push": {
              "$ifNull": [
                "$_id.display-by_id.value", null
              ]
            }
          },
          "group-by_labels": {
            "$push": {
              "$ifNull": [
                "$_id.group-by_display.value", null
              ]
            }
          },
          "group-by_ids": {
            "$push": {
              "$ifNull": [
                "$_id.group-by_id.value", null
              ]
            }
          },
          "result_push": {
            "$push": "$result"
          }
        }
      },
      {
        "$project": {
          "_id": 0,
          "x-group": "$display-by_labels",
          "x-group-ids": "$display-by_ids",
          "x-name": "$group-by_labels",
          "x-name-ids": "$group-by_ids",
          "data": [
            {
              "result": "$result_push",
              "option": "count"
            }
          ]
        }
      }
    ]
}

As for the explain plan, it says:

{
 "stage": "COLLSCAN",
 "nReturned": 236,
 "executionTimeMillisEstimate": 0,
 "works": 238,
 "advanced": 236,
 "needTime": 1,
 "needYield": 0,
 "saveState": 0,
 "restoreState": 0,
 "isEOF": 1,
 "direction": "forward",
 "docsExamined": 236
}

Here is an example document:

{  "_id": "3892",  "type": "opportunities",  "weightedAllocatedValue": 0,  "probability": 0.01,  "createdTimestamp": {    "$date": {      "$numberLong": "1278086242000"    }  },  "classificationType": "New Investment",  "weightedValue": 15000,  "isErisa": false,  "expectedInvestmentDate": {    "$date": {      "$numberLong": "1291183200000"    }  },  "allocatedAmount": 0,  "name": "Allen Investments Opp",  "requestedAmount": 1500000,  "currencyCode": "USD",  "effectiveDate": {    "$date": {      "$numberLong": "1278086242000"    }  },  "timeSeriesCustomFieldValues": [    {      "definitionId": "27405",      "name": "Element",      "fieldType": "select",      "values": [        {          "id": "92171",          "effectiveDate": {            "$date": {              "$numberLong": "1664600400000"            }          },          "value": [            {              "lovSet": 13124,              "code": "1",              "value": "Earth"            }          ]        },        {          "id": "92173",          "effectiveDate": {            "$date": {              "$numberLong": "1667278800000"            }          },          "value": [            {              "lovSet": 13124,              "code": "2",              "value": "Wind"            }          ]        },        {          "id": "92175",          "effectiveDate": {            "$date": {              "$numberLong": "1669874400000"            }          },          "value": [            {              "lovSet": 13124,              "code": "3",              "value": "Water"            }          ]        }      ]    }  ],  "investor": {    "_id": "788709",    "type": "contacts",    "birthday": {      "$date": {        "$numberLong": "-491857200000"      }    },    "lastName": "Allen",    "isEmployee": false,    "website": "http://www.allenfunds.com",    "gender": "UNSPECIFIED",    "otherId": "1010",    "jobTitle": "CIO",    "createdTimestamp": {      "$date": {        "$numberLong": "1272386954000"      }    },    "firstName": "George",    "name": "Allen, George",    "contactSource": {      "_id": "3668",      "type": "contact-sources",      "name": "3P-MKT 3",      "description": "3P-MKT 3"    },    "contactLocations": [      {        "_id": "25203",        "type": "contact-locations",        "country": "United States",        "city": "New York",        "postalCode": "10011",        "locationTitle": "Business",        "isPrimaryLocation": true,        "state": "NY"      }    ],    "clientDefinedEntityType": {      "_id": "52",      "type": "entity-types",      "pluralName": "People",      "name": "Person",      "resourceType": "people"    },    "permissionBucket": {      "_id": "2451",      "type": "permission-buckets",      "name": "Public"    }  },  "stage": {    "_id": "974",    "type": "opportunity-stages",    "sortOrder": {      "$numberLong": "3"    },    "name": "Committed/Processing",    "closed": false  },  "createdBy": {    "_id": "51443",    "type": "system-users",    "lastName": "SuperAdmin",    "firstName": "Training",    "fullName": "Training SuperAdmin",    "disabled": false,    "userName": "SuperAdmin"  },  "primaryContact": {    "_id": "788709",    "type": "contacts",    "birthday": {      "$date": {        "$numberLong": "-491857200000"      }    },    "lastName": "Allen",    "isEmployee": false,    "website": "http://www.allenfunds.com",    "gender": "UNSPECIFIED",    "otherId": "1010",    "jobTitle": "CIO",    "createdTimestamp": {      "$date": {        "$numberLong": "1272386954000"      }    },    "firstName": "George",    "name": "Allen, George",    "contactSource": {      "_id": "3668",      "type": "contact-sources",      "name": "3P-MKT 3",      "description": "3P-MKT 3"    },    "contactLocations": [      {        "_id": "25203",        "type": "contact-locations",        "country": "United States",        "city": "New York",        "postalCode": "10011",        "locationTitle": "Business",        "isPrimaryLocation": true,        "state": "NY"      }    ],    "clientDefinedEntityType": {      "_id": "52",      "type": "entity-types",      "pluralName": "People",      "name": "Person",      "resourceType": "people"    },    "permissionBucket": {      "_id": "2451",      "type": "permission-buckets",      "name": "Public"    }  },  "clientDefinedEntityType": {    "_id": "56",    "type": "entity-types",    "pluralName": "Opportunities",    "name": "Opportunity",    "resourceType": "opportunities"  },  "permissionBucket": {    "_id": "2451",    "type": "permission-buckets",    "name": "Public"  },  "investorType": {    "_id": "12106",    "type": "investor-types",    "classificationType": "Endowment / Foundation",    "investorType": "endowment"  },  "_index": [    {      "k": "_id",      "v": "3892"    },    {      "k": "type",      "v": "opportunities"    },    {      "k": "investor._id",      "v": "788709"    },    {      "k": "investor.type",      "v": "contacts"    },    {      "k": "stage._id",      "v": "974"    },    {      "k": "stage.type",      "v": "opportunity-stages"    },    {      "k": "createdBy._id",      "v": "51443"    },    {      "k": "createdBy.type",      "v": "system-users"    },    {      "k": "primaryContact._id",      "v": "788709"    },    {      "k": "primaryContact.type",      "v": "contacts"    },    {      "k": "clientDefinedEntityType._id",      "v": "56"    },    {      "k": "clientDefinedEntityType.type",      "v": "entity-types"    },    {      "k": "permissionBucket._id",      "v": "2451"    },    {      "k": "permissionBucket.type",      "v": "permission-buckets"    },    {      "k": "investorType._id",      "v": "12106"    },    {      "k": "investorType.type",      "v": "investor-types"    },    {      "k": "effectiveDate",      "v": {        "$date": {          "$numberLong": "1278086242000"        }      }    },    {      "k": "probability",      "v": 0.01    },    {      "k": "currencyCode",      "v": "USD"    },    {      "k": "createdTimestamp",      "v": {        "$date": {          "$numberLong": "1278086242000"        }      }    },    {      "k": "expectedInvestmentDate",      "v": {        "$date": {          "$numberLong": "1291183200000"        }      }    },    {      "k": "requestedAmount",      "v": 1500000    },    {      "k": "allocatedAmount",      "v": 0    },    {      "k": "name",      "v": "Allen Investments Opp"    },    {      "k": "isErisa",      "v": false    }  ]}

The Mongo version is 4.4.4 Community Edition.

As for the low document size, we actually set the facet document size low so we could test locally. The original problem was happening in production, on a much larger dataset, several thousand documents.

Thanks for providing those details Dan.

The pipeline is relatively complex (multiple unwinds, groups & sorts) and in addition to the fact that $facet cannot make use of indexes, the intermediate values will need to be stored in memory which could be the main issue. As you have mentioned earlier, the data footprint will grow and with that, the pipeline’s memory needs would as well. If this is a frequent operation, perhaps you could consider using materialized views.

The typical solution to a query’s performance & memory issue is to utilize proper indexing. Perhaps the page Create Indexes to Support Your Queries would be useful in this regard?

The Mongo version is 4.4.4 Community Edition.

As noted in the Release Notes for MongoDB Version 4.4, MongoDB version 4.4.4 is not recommended for production use due to critical issue WT-7995, fixed in later versions. Use the latest available patch release version.

Regards,
Jasone

Thank you very much Jason!
I will look into these suggestions!

1 Like