Slow performance when Using Aggregation along with $geoNear with multiple query condition

4_Bits_Solutions · June 4, 2022, 7:23am

Hi, I hope you can help.
I’m trying to use an aggregation query for $geoNear with multiple conditions in the query key.
the query is super slow (~800ms). But If I use the same query conditions without $geoNear but using $match with exact same condition it works well (~ 25ms). the query has mostly ( $in, $lte, $gte) operators. I don’t understand it works well without $geoNear.

I would appreciate it if you guys could help.
Query: https://gitlab.com/-/snippets/2343327

Tarun_Gaur · June 17, 2022, 10:44am

Hello @4_Bits_Solutions,

Welcome to the community!!

Could you please help me with below things to look into this issue?

What is your deployment topology (e.g. standalone, replica set, sharded cluster)?
What is the exact version of your MongoDB?
Please run both queries with explain in executionStats mode and share the output.

Regards,
Tarun Gaur

4_Bits_Solutions · June 19, 2022, 2:33pm

Thanks, Tarun for the response. I really appreciate that. Here are the things that might help you look into the issue more closely.

It’s a replica set (on Aws)
MongoDB server version - 4.2.20
QueryPlan-$geoNear QueryPlan-$match

Thanks,
Amit

Tarun_Gaur · June 29, 2022, 7:02am

Hi Amit,

In the explain query output shared, we can see that in the match query specific indexes are being used such as

"indexName" : "minEnergySpectrum_1"
"indexName" : "maxEnergySpectrum_1"
"indexName" : "totalAvgRating_1"
.
.
so on

where as in geoNear query only below index is being used

"indexName" : "location.coordinates_2dsphere"

This changes how the query planner works and basically we cannot say that both query are same or work similarly hence you can see the difference between the outputs in executionStats section of the output

Below are the parameters from $match query
  			"executionSuccess" : true,
  			"nReturned" : 223,
  			"executionTimeMillis" : 90,
  			"totalKeysExamined" : 4517,
  			"totalDocsExamined" : 4187
Below are the parameters from $geoNear query
            "executionSuccess" : true,
  			"nReturned" : 223,
  			"executionTimeMillis" : 809,
  			"totalKeysExamined" : 89476,
  			"totalDocsExamined" : 177470
explain.executionStats.nReturned

Number of documents that match the query condition. nReturned corresponds to the n field returned by cursor.explain() in earlier versions of MongoDB.
Here we can see that number of returned documents are same.
explain.executionStats.executionTimeMillis

Total time in milliseconds required for query plan selection and query execution. executionTimeMillis corresponds to the millis field returned by cursor.explain() in earlier versions of MongoDB.
Here we can see that time taken to execute both queries differ.
explain.executionStats.totalKeysExamined

Number of index entries scanned. totalKeysExamined corresponds to the nscanned field returned by cursor.explain() in earlier versions of MongoDB.
Here we can see the difference in index entries scanned of both queries.
explain.executionStats.totalDocsExamined

Number of documents examined during query execution. Common query execution stages that examine documents are COLLSCAN and FETCH .
Here we can see documents scanned for $geoNear Query are way higher than $match query.

NOTE:
totalDocsExamined refers to the total number of documents examined and not to the number of documents returned. For example, a stage can examine a document in order to apply a filter. If the document is filtered out, then it has been examined but will not be returned as part of the query result set.

If a document is examined multiple times during query execution, totalDocsExamined counts each examination. That is, totalDocsExamined is not a count of the total number of unique documents examined.

Generally, MongoDB only uses one index to fulfill most queries. However, each clause of an $or query may use a different index, and in addition, MongoDB can use an intersection of multiple indexes. Reference document to check Indexing Strategies.

Materialized view is one possibility for performance improvement. It does this by pre-aggregating the complex query into a different collection, and thus the filter part of the final FETCH stage of the geo query is basically pre-calculated so it may improve performance at the expense of disk space. Materialized view is the recommended method for certain workloads where a regular non-materialized view cannot be used due to how it currently works in MongoDB (see View creation ). One of which is that views currently do not support $geoNear stage.

Although superficially both queries look similar at a glance, these are two very different queries, with very different execution plan, using very different indexes. Thus they cannot be compared apple-to-apple. The geo query does a lot more work, which is reflected in the explain output.

Thanks,
Tarun

system · July 6, 2022, 8:49am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.