I had written a MongoDB aggregation pipeline in C#. I was able to obtain the correct result initially. However, after a couple of months, the dataset grew larger, and the aggregation started to perform very slowly. While the $match
stage, benefiting from an index, continued to deliver fast results, the issue was observed with the $facet
stage.
{
$match: {
$or: [
{
$and: [
{
PropertyAddressState: {
$eq: "OH",
},
PropertyAddressCity: {
$eq: "DAYTON",
},
},
],
},
{
$and: [
{
PropertyAddressState: {
$eq: "CA",
},
PropertyAddressCity: {
$eq: "LOS ANGELES",
},
},
],
},
],
},
},
{
$facet: {
results: [
{
$group: {
_id: null,
tax_delinquent: {
$sum: {
$toInt: "$IsTaxDelinquent",
},
},
absentee_owners: {
$sum: {
$toInt: "$IsAbsenteeOwners",
},
},
out_of_state_owners: {
$sum: {
$toInt: "$IsOutOfStateOwners",
},
},
purchased_2010_2012: {
$sum: {
$toInt: "$IsPurchasedin20102012",
},
},
multi_family_owners: {
$sum: {
$toInt: "$IsMultiFamilyOwners",
},
},
properties_in_a_trust: {
$sum: {
$toInt: "$IsPropertiesInATrust",
},
},
vacant: {
$sum: {
$toInt: "$IsVacant",
},
},
land_residential: {
$sum: {
$toInt: "$IsLand",
},
},
pre_foreclosure: {
$sum: {
$toInt: "$IsPreforecloure",
},
},
pre_foreclosure_purchased_2010_2012: {
$sum: {
$toInt: {
$and: [
"$IsPreforecloure",
"$IsPurchasedin20102012",
],
},
},
},
pre_foreclosure_out_of_state: {
$sum: {
$toInt: {
$and: [
"$IsPreforecloure",
"$IsOutOfStateOwners",
],
},
},
},
pre_foreclosure_vacant: {
$sum: {
$toInt: {
$and: [
"$IsPreforecloure",
"$IsVacant",
],
},
},
},
absentee_owners_purchased_2010_2012: {
$sum: {
$toInt: {
$and: [
"$IsAbsenteeOwners",
"$IsPurchasedin20102012",
],
},
},
},
out_of_state_purchased_2010_2012: {
$sum: {
$toInt: {
$and: [
"$IsOutOfStateOwners",
"$IsPurchasedin20102012",
],
},
},
},
},
},
],
absentee_owners_with_multiple_properties_vacant:
[
{
$match: {
IsAbsenteeOwners: true,
IsVacant: true,
},
},
{
$group: {
_id: "$PartyOwner1NameFull",
count: {
$sum: 1,
},
},
},
{
$match: {
_id: {
$ne: null,
},
count: {
$gte: 2,
$lte: 10000,
},
},
},
{
$count: "count",
},
],
absentee_owners_multiple_properties: [
{
$match: {
IsAbsenteeOwners: true,
},
},
{
$group: {
_id: "$PartyOwner1NameFull",
count: {
$sum: 1,
},
},
},
{
$match: {
_id: {
$ne: null,
},
count: {
$gte: 2,
$lte: 10000,
},
},
},
{
$count: "count",
},
],
},
}
This is the execution stats for the above aggregation.
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 518601,
"executionTimeMillis" : 34743,
"totalKeysExamined" : 582076,
"totalDocsExamined" : 518601,
The response for the above aggregation is like this.
results:
Array (1)
Object _id: null
tax_delinquent: 2619
absentee owners: 32681
out_of_state_owners: 2415
purchased_2010_2012: 11543
multi_family_owners: 20944
properties_in_a_trust: 10484
vacant: 1230
land residential: 4173
pre_foreclosure: 38854
pre_foreclosure_purchased_201: 4893
absentee_owners_with_multiple:
Array (1)
Object count: 18
absentee_owners_multiple_prop:
Array (1)
Object count: 1801
In the provided aggregation, I employed a $group
stage within the $facet
. Consequently, I removed the $group
stage and attempted parallel processing by separating each category.
IsTaxDelinquent: Array (1)
• 0: Object count: 2619
IsAbsenteeOwners: Array (1)
• 0: Object count: 32681
IsOut0f5tateOwners: Array (1)
• 0: Object count: 2415
IsPurchasedin20102012: Array (1)
• 0: Object count: 11543
IsMultiFamilyOwners: Array (1)
IsPropertiesInATrust: Array (1)
IsVacant: Array (1)
IsLand: Array (1)
IsPreforecloure: Array (1)
pre_foreclosure_purchased_2010_2012: Array (1)
pre_foreclosure_out_of_state: Array (1)
pre_foreclosure_vacant: Array (1)
absentee_owners_vacant: Array (1)
out_of_state_vacant: Array (1)
vacant_purchased_2010_2012: Array (1) absentee_owners_with_multiple_properties_vacant: Array (1)
{
$facet: {
IsTaxDelinquent: [
{
$match: {
IsTaxDelinquent: true,
},
},
{
$count: "count",
},
],
IsAbsenteeOwners: [
{
$match: {
IsAbsenteeOwners: true,
},
},
{
$count: "count",
},
],
IsOutOfStateOwners: [
{
$match: {
IsOutOfStateOwners: true,
},
},
{
$count: "count",
},
],
IsPurchasedin20102012: [
{
$match: {
IsPurchasedin20102012: true,
},
},
{
$count: "count",
},
],
IsMultiFamilyOwners: [
{
$match: {
IsMultiFamilyOwners: true,
},
},
{
$count: "count",
},
],
IsPropertiesInATrust: [
{
$match: {
IsPropertiesInATrust: true,
},
},
{
$count: "count",
},
],
IsVacant: [
{
$match: {
IsVacant: true,
},
},
{
$count: "count",
},
],
IsLand: [
{
$match: {
IsLand: true,
},
},
{
$count: "count",
},
],
IsPreforecloure: [
{
$match: {
IsPreforecloure: true,
},
},
{
$count: "count",
},
],
pre_foreclosure_purchased_2010_2012: [
{
$match: {
IsOutOfStateOwners: true,
},
},
{
$count: "count",
},
],
pre_foreclosure_purchased_2010_2012: [
{
$match: {
$and: [
{
IsPreforecloure: true,
},
{
IsPurchasedin20102012: true,
},
],
},
},
{
$count: "count",
},
],
absentee_owners_with_multiple_properties_vacant:
[
{
$match: {
IsAbsenteeOwners: true,
IsVacant: true,
},
},
{
$group: {
_id: "$PartyOwner1NameFull",
count: {
$sum: 1,
},
},
},
{
$match: {
_id: {
$ne: null,
},
count: {
$gte: 2,
$lte: 10000,
},
},
},
{
$count: "count",
},
],
absentee_owners_multiple_properties: [
{
$match: {
IsAbsenteeOwners: true,
},
},
{
$group: {
_id: "$PartyOwner1NameFull",
count: {
$sum: 1,
},
},
},
{
$match: {
_id: {
$ne: null,
},
count: {
$gte: 2,
$lte: 10000,
},
},
},
{
$count: "count",
},
],
},
}
Despite producing accurate results, this approach still resulted in slow loading times. I’m inquiring whether there exists a more effective method to achieve results using the $facet
stage.
I’m using MongoDB version 4.4.
Below I have set up a sample data set with the necessary fields for testing purposes