How to optimize aggregation pipeline, using the $facet operator?

Alex_Bjorlig · October 18, 2021, 9:05am

In our project we use the aggregation pipeline and $facet operator extensively.

When we use the explain: true option, we look at the query plan. But the output related to $facet is really limited, and does not mention anything about indexes etc.

How do you effectively analyze aggregation pipelines with the $match step?

Marcus · October 19, 2021, 12:34am

Hi @Alex_Bjorlig thanks for your interest and glad to see you back in the forum. While I cannot say much just yet, we will be releasing something that I think your team will be able to use that leverages search indexes and facets in Lucene very soon. Stay tuned.

Stennie_X · October 19, 2021, 12:41am

Hi @Alex_Bjorlig,

Can you provide an example of your aggregation pipeline and confirm your version of MongoDB server? The actual explain output would also be helpful if you have a specific query about outcomes.

Per the documentation on $facet Index Use (as at MongoDB 5.0):

The $facet stage, and its sub-pipelines, cannot make use of indexes, even if its sub-pipelines use $match or if $facet is the first stage in the pipeline. The $facet stage will always perform a COLLSCAN during execution.

If you have initial $match & $sort stages before your $facet, those can be candidates for index usage and should be covered in the explain output. For more information, see Pipeline Operators and Indexes and Aggregation Pipeline Optimization.

As @Marcus notes, there is also some development progress toward Faster Faceting in Atlas Search which may also be of interest when available.

Regards,
Stennie

Alex_Bjorlig · October 19, 2021, 7:36am

Thanks for awesome answers - and now it makes more sense we get a bunch of warnings about collection scans, because we use facets for every operation. We implement cursor based pagination, and in a facet operation we:

Lkp the actuel rows to return
The total count
The paginated count
Optional facet results.

Are there any feature requests or issues tracking the fact the facet sub-pipelines can’t use an index?

Marcus · October 19, 2021, 4:09pm

Maybe @Stennie_X can pull strings to get you in the private preview! Given every query has this syntax.

Alex_Bjorlig · October 21, 2021, 7:21am

We are already thinking about how to rewrite the resolvers - because this is not something we do in atlas-search, but “just” $facets in the aggregation pipeline.