How to improve the performance of $facet stage in Aggregation Pipeline in MongoDb C#

I had written a MongoDB aggregation pipeline in C#. I was able to obtain the correct result initially. However, after a couple of months, the dataset grew larger, and the aggregation started to perform very slowly. While the $match stage, benefiting from an index, continued to deliver fast results, the issue was observed with the $facet stage.

{
    $match: {
      $or: [
        {
          $and: [
            {
              PropertyAddressState: {
                $eq: "OH",
              },
              PropertyAddressCity: {
                $eq: "DAYTON",
              },
            },
          ],
        },
        {
          $and: [
            {
              PropertyAddressState: {
                $eq: "CA",
              },
              PropertyAddressCity: {
                $eq: "LOS ANGELES",
              },
            },
          ],
        },
      ],
    },
  },
{
    $facet: {
      results: [
        {
          $group: {
            _id: null,
            tax_delinquent: {
              $sum: {
                $toInt: "$IsTaxDelinquent",
              },
            },
            absentee_owners: {
              $sum: {
                $toInt: "$IsAbsenteeOwners",
              },
            },
            out_of_state_owners: {
              $sum: {
                $toInt: "$IsOutOfStateOwners",
              },
            },
            purchased_2010_2012: {
              $sum: {
                $toInt: "$IsPurchasedin20102012",
              },
            },
            multi_family_owners: {
              $sum: {
                $toInt: "$IsMultiFamilyOwners",
              },
            },
            properties_in_a_trust: {
              $sum: {
                $toInt: "$IsPropertiesInATrust",
              },
            },
            vacant: {
              $sum: {
                $toInt: "$IsVacant",
              },
            },
            land_residential: {
              $sum: {
                $toInt: "$IsLand",
              },
            },
            pre_foreclosure: {
              $sum: {
                $toInt: "$IsPreforecloure",
              },
            },
            pre_foreclosure_purchased_2010_2012: {
              $sum: {
                $toInt: {
                  $and: [
                    "$IsPreforecloure",
                    "$IsPurchasedin20102012",
                  ],
                },
              },
            },
            pre_foreclosure_out_of_state: {
              $sum: {
                $toInt: {
                  $and: [
                    "$IsPreforecloure",
                    "$IsOutOfStateOwners",
                  ],
                },
              },
            },
            pre_foreclosure_vacant: {
              $sum: {
                $toInt: {
                  $and: [
                    "$IsPreforecloure",
                    "$IsVacant",
                  ],
                },
              },
            },
            absentee_owners_purchased_2010_2012: {
              $sum: {
                $toInt: {
                  $and: [
                    "$IsAbsenteeOwners",
                    "$IsPurchasedin20102012",
                  ],
                },
              },
            },
            out_of_state_purchased_2010_2012: {
              $sum: {
                $toInt: {
                  $and: [
                    "$IsOutOfStateOwners",
                    "$IsPurchasedin20102012",
                  ],
                },
              },
            },
          },
        },
      ],
      absentee_owners_with_multiple_properties_vacant:
        [
          {
            $match: {
              IsAbsenteeOwners: true,
              IsVacant: true,
            },
          },
          {
            $group: {
              _id: "$PartyOwner1NameFull",
              count: {
                $sum: 1,
              },
            },
          },
          {
            $match: {
              _id: {
                $ne: null,
              },
              count: {
                $gte: 2,
                $lte: 10000,
              },
            },
          },
          {
            $count: "count",
          },
        ],
      absentee_owners_multiple_properties: [
        {
          $match: {
            IsAbsenteeOwners: true,
          },
        },
        {
          $group: {
            _id: "$PartyOwner1NameFull",
            count: {
              $sum: 1,
            },
          },
        },
        {
          $match: {
            _id: {
              $ne: null,
            },
            count: {
              $gte: 2,
              $lte: 10000,
            },
          },
        },
        {
          $count: "count",
        },
      ],
    },
  }

This is the execution stats for the above aggregation.

"executionStats" : {
					"executionSuccess" : true,
					"nReturned" : 518601,
					"executionTimeMillis" : 34743,
					"totalKeysExamined" : 582076,
					"totalDocsExamined" : 518601,

The response for the above aggregation is like this.

results: 
     Array (1) 
       Object _id: null 
tax_delinquent: 2619 
absentee owners: 32681
out_of_state_owners: 2415 
purchased_2010_2012: 11543 
multi_family_owners: 20944
properties_in_a_trust: 10484
vacant: 1230
land residential: 4173
pre_foreclosure: 38854
pre_foreclosure_purchased_201: 4893
absentee_owners_with_multiple: 
    Array (1) 
       Object count: 18
absentee_owners_multiple_prop:
    Array (1) 
      Object count: 1801

In the provided aggregation, I employed a $group stage within the $facet. Consequently, I removed the $group stage and attempted parallel processing by separating each category.

IsTaxDelinquent: Array (1)
    • 0: Object count: 2619 
IsAbsenteeOwners: Array (1)
    • 0: Object count: 32681
IsOut0f5tateOwners: Array (1) 
    • 0: Object count: 2415 
IsPurchasedin20102012: Array (1) 
    • 0: Object count: 11543 
IsMultiFamilyOwners: Array (1) 
IsPropertiesInATrust: Array (1)
IsVacant: Array (1)
IsLand: Array (1) 
IsPreforecloure: Array (1) 
pre_foreclosure_purchased_2010_2012: Array (1) 
pre_foreclosure_out_of_state: Array (1) 
pre_foreclosure_vacant: Array (1) 
absentee_owners_vacant: Array (1) 
out_of_state_vacant: Array (1) 
vacant_purchased_2010_2012: Array (1) absentee_owners_with_multiple_properties_vacant: Array (1)
{
    $facet: {
      IsTaxDelinquent: [
        {
          $match: {
            IsTaxDelinquent: true,
          },
        },
        {
          $count: "count",
        },
      ],
      IsAbsenteeOwners: [
        {
          $match: {
            IsAbsenteeOwners: true,
          },
        },
        {
          $count: "count",
        },
      ],
      IsOutOfStateOwners: [
        {
          $match: {
            IsOutOfStateOwners: true,
          },
        },
        {
          $count: "count",
        },
      ],
      IsPurchasedin20102012: [
        {
          $match: {
            IsPurchasedin20102012: true,
          },
        },
        {
          $count: "count",
        },
      ],
      IsMultiFamilyOwners: [
        {
          $match: {
            IsMultiFamilyOwners: true,
          },
        },
        {
          $count: "count",
        },
      ],
      IsPropertiesInATrust: [
        {
          $match: {
            IsPropertiesInATrust: true,
          },
        },
        {
          $count: "count",
        },
      ],
      IsVacant: [
        {
          $match: {
            IsVacant: true,
          },
        },
        {
          $count: "count",
        },
      ],
      IsLand: [
        {
          $match: {
            IsLand: true,
          },
        },
        {
          $count: "count",
        },
      ],
      IsPreforecloure: [
        {
          $match: {
            IsPreforecloure: true,
          },
        },
        {
          $count: "count",
        },
      ],
      pre_foreclosure_purchased_2010_2012: [
        {
          $match: {
            IsOutOfStateOwners: true,
          },
        },
        {
          $count: "count",
        },
      ],
      pre_foreclosure_purchased_2010_2012: [
        {
          $match: {
            $and: [
              {
                IsPreforecloure: true,
              },
              {
                IsPurchasedin20102012: true,
              },
            ],
          },
        },
        {
          $count: "count",
        },
      ],
      absentee_owners_with_multiple_properties_vacant:
        [
          {
            $match: {
              IsAbsenteeOwners: true,
              IsVacant: true,
            },
          },
          {
            $group: {
              _id: "$PartyOwner1NameFull",
              count: {
                $sum: 1,
              },
            },
          },
          {
            $match: {
              _id: {
                $ne: null,
              },
              count: {
                $gte: 2,
                $lte: 10000,
              },
            },
          },
          {
            $count: "count",
          },
        ],
      absentee_owners_multiple_properties: [
        {
          $match: {
            IsAbsenteeOwners: true,
          },
        },
        {
          $group: {
            _id: "$PartyOwner1NameFull",
            count: {
              $sum: 1,
            },
          },
        },
        {
          $match: {
            _id: {
              $ne: null,
            },
            count: {
              $gte: 2,
              $lte: 10000,
            },
          },
        },
        {
          $count: "count",
        },
      ],
    },
  }

Despite producing accurate results, this approach still resulted in slow loading times. I’m inquiring whether there exists a more effective method to achieve results using the $facet stage.

I’m using MongoDB version 4.4.

Below I have set up a sample data set with the necessary fields for testing purposes

temp_tax_data2

Hi @Shehan_Vanderputt and welcome to MongoDB community forums!!

Thank you for sharing the details but it would be helpful if you share the index definition as well which would help me analyse the slow query in more detail.
However, in saying so, the query performance depends on various factors.

  1. As mentioned, with the increase in the size of the collection, the performance is degraded. An under provisioned deployment could be one of the possible reasons resulting in slow query. Could you please confirm the deployment type(standalone node, sharded cluster, replica set etc) and the hardware on which the server is deployed.
  2. As mentioned in the documentation for Index use for $facet, the $facet will not use the index and perform a COLLSCAN instead. The issue here might be the use of $facet on such a large collection compared to the provisioned hardware. One possible solution is to align the schema design with the query.
  3. Also, could you also confirm if this is a frequent workload? If possible, redesigning of the schema would be recommended. In saying so, if you have query which is to be used frequently, and redesigning the schema does not work for you, you can make use of the materialised views which might prove to be helpful in improving the performance of the query.

Warm regards
Aasawari