How to improve the performance of Aggregation Pipeline in MongoDb C#

I have a MongoDB aggregation pipeline, which I have written in C#.

$geoNear{
  near: {
    type: "Point",
    coordinates: [-110.29665, 31.535699],
  },
  distanceField: "distance",
  maxDistance: 100,
  query: {
    $and: [
      {
        IsResidential: true,
      },
      {
        DaysSinceLastSale: {
          $gt: 10,
        },
      },
    ],
  },
  spherical: true,
},

$lookup:
{
  from: "tax_assessor",
  let: {
    propertyCity: "Chicago",
    propertyState: "IL"
    ownerName: "$OwnerName",
  },
  pipeline: [
    {
      $match: {
        $expr: {
          $and: [
            {$eq: ["$PropertyCity", "$$propertyCity"]},
            {$eq: ["$OwnerName", "$$ownerName"]},
            {$eq: ["$PropertyState", "$$propertyState"]},
          ],
        },
      },
    },
  ],
  as: "NumberOfProperties",
},

$project:
{
  _id: 0,
  FullAddress: 1,
  OwnerName: 1,
  distance: 1,
  YearBuilt: 1,
  NumberOfProperties: {
    $size: "$NumberOfProperties"
  }
}

Here what I need is something similar to this SQL:

select res1.owner_name, res1.full_address, res1.distance, res1.year_built, count(res2.owner_name) as property_count from 
(select * from properties where geolocation is <within a given range> and <some filters>) res1
left join 
(select * from properties where city=<given city> and state=<given state>) res2
on res1.owner_name = res2.owner_name
group by res1.owner_name
order by res1.distance

I could get the correct result but this aggregation is very slow.

When checking the execution plan, I saw the first stage - GeoNear has used an index. But in the second stage - lookup, it has not used any of the indexes.

"stages" : [
    {
      "$geoNearCursor" : {
        "queryPlanner" : {
          "plannerVersion" : 1,
          "namespace" : "tax_assessor",
          "indexFilterSet" : false,
          "parsedQuery" : {..},
          "queryHash" : "4B38534E",
          "planCacheKey" : "2328FDE9",
          "winningPlan" : {
            "stage" : "FETCH",
            "filter" : {..},
            "inputStage" : {
              "stage" : "GEO_NEAR_2DSPHERE",
              "keyPattern" : {
                "PropertyGeoPoint" : "2dsphere",
                "DaysSinceLastSale" : 1,
                "IsResidential" : 1
              },
              "indexName" : "sta_geo_idx",
              "indexVersion" : 2,
              "inputStages" : [..]
            }
          },
          "rejectedPlans" : [ ]
        },
        "executionStats" : {
          "executionSuccess" : true,
          "nReturned" : 2,
          "executionTimeMillis" : 1911,
          "totalKeysExamined" : 450,
          "totalDocsExamined" : 552,
          "executionStages" : {..},
                  }
      },
      "nReturned" : NumberLong(2),
      "executionTimeMillisEstimate" : NumberLong(0)
    },
    {
      "$lookup" : {
        "from" : "tax_assessor",
        "as" : "NumberOfProperties",
        "let" : {
          "propertyCity" : {
            "$const" : "COLUMBUS"
          },
          "propertyState" : {
            "$const" : "OH"
          },
          "ownerName" : "$OwnerName"
        },
        "pipeline" : [
          {
            "$match" : {
              "$expr" : {
                "$and" : [
                  {
                    "$eq" : [
                      "$PropertyCity",
                      "$$propertyCity"
                    ]
                  },
                  {
                    "$eq" : [
                      "$PropertyState",
                      "$$propertyState"
                    ]
                  },
                  {
                    "$eq" : [
                      "$OwnerName",
                      "$$ownerName"
                    ]
                  }
                ]
              }
            }
          }
        ]
      },
      "nReturned" : NumberLong(2),
      "executionTimeMillisEstimate" : NumberLong(1910)
    },
    {
      "$project" : {
        "OwnerName" : true,
        "FullAddress" : true,
        "distance" : true,
        "YearBuilt" : true,
        "NumberOfProperties" : {
          "$size" : [
            "$NumberOfProperties"
          ]
        },
        "_id" : false
      },
      "nReturned" : NumberLong(2),
      "executionTimeMillisEstimate" : NumberLong(1910)
    },
    {
      "$sort" : {
        "sortKey" : {
          "Distance" : 1
        }
      },
      "nReturned" : NumberLong(2),
      "executionTimeMillisEstimate": NumberLong(1910)
    }
  ]

Based on the above stats, $lookup is why it doesn’t use any indexes. Does it give me an optimized result by rearranging the stages or applying a proper index? Or is there a better way to get the NumberOfProperties without using $lookup.

Hi @Shehan_Vanderputt and welcome to the MongoDB Community forum!!

In order to understand the requirement better, could you help me with some information to replicate in my local environment.

  1. A sample document with the necessary fields.
  2. The index and type of index created on those fields.
  3. The expected response.
  4. The MongoDB version you are on.

Regards
Aasawari

Hi, @Aasawari I have created sample data set with the necessary fields. You can find this using the below link.

temp_data

Hi @Shehan_Vanderputt and thank you for sharing the sample data here.

Taking reference from the sample data shared, I tried to create sample document inside the collection as:

test> db.sample.findOne()
{
  _id: ObjectId("642e45dda86d7c5ba907fedc"),
  DaysSinceLastSale: 59,
  IsResidential: true,
  Owner1NameFull: 'Elijah Fisher',
  AddressCity: 'New Bedford',
  AddressState: 'Arkansas',
  AddressZIP: 132895,
  position: [ -72.29083, 39.13419 ],
  year: '2055'
}

and the indexes defined on the collection are:

test> db.sample.getIndexes()
[
  { v: 2, key: { _id: 1 }, name: '_id_' },
  {
    v: 2,
    key: { position: '2dsphere' },
    name: 'position_2dsphere',
    '2dsphereIndexVersion': 3
  },
  { v: 2, key: { AddressCity: 1 }, name: 'AddressCity_1' }
]

As mentioned, I tried to use the query similar to the one mentioned in the first post:

db.sample.aggregate([{
	$geoNear: {
		near: {
			type: "Point",
			coordinates: [-72.29083, 39.13419]
		},
		distanceField: "distance",
		"maxDistance": 200000
	}
}, {
	$lookup: {
		from: "location",
		localField: "AddressCity",
		foreignField: "AddressCity",
		let: {
			addressCity: "Portsmouth",
			addressState: "Idaho",
			ownerName: "Owner1NameFull"
		},
		pipeline: [{
			$match: {
				AddressCity: '$$addressCity',
				AddressState: "$$addressState",
				Owner1NameFull: '$$ownerName'
			}
		}],
		as: "newFields"
	}
}, {
	$project: {
		_id: 0,
		"AddressCity": 1,
		"AddressState": 1,
		"Owner1NameFull": 1,
		"newFields": {
			$size: "$newFields"
		}
	}
}])

and it makes use of the Index for the geoNear and the match stage of the pipeline:

For the $geoNear

inputStage: {
                  stage: 'IXSCAN',
                  nReturned: 8,
                  executionTimeMillisEstimate: 0,
                  works: 27,
                  advanced: 8,
                  needTime: 18,
                  needYield: 0,
                  saveState: 0,
                  restoreState: 0,
                  isEOF: 1,
                  keyPattern: { position: '2dsphere' },
                  indexName: 'position_2dsphere',

and for the $match stage:

totalDocsExamined: Long("0"),
      totalKeysExamined: Long("0"),
      collectionScans: Long("0"),
      indexesUsed: [ 'AddressCity_1' ],
      nReturned: Long("28"),
      executionTimeMillisEstimate: Long("5")

Please note that, for index to be used after the lookup stage, the joined collection need to have the index created on the fields.

Please visit the documentation on Query Optimisation for further understanding.

Let us know if you have any further questions.

Regards
Aasawari

@Aasawari Thank you very much. That worked!!!