Count then find vs aggregate, which one is faster?

Ariel_Ariel · February 20, 2021, 2:38am

I’m using mongodb go driver for my application and I’m confused on querying and returning meta with the results:

1st option

collection.countDocuments(ctx, filter)
// add more filter
collection.Find(ctx, filter, limit(200))

vs

2nd option

collection.Aggregate(
  mongo.Pipeline{
    // filter first
    // then sort
    // do a facet:
      // 1st pipeline: count
      // 2nd pipeline: apply added filter, limit
  }
)

I have attempted to translate it into the mongodb commands and run them on my shell

db.collection.explain().aggregate(...)

but I’m still confused:

There’s a lot of slow query message on my mongodb log, does that certainly mean my commands are slow? I have applied index and it shows IXSCAN

{"t":{"$date":"2021-02-18T14:03:17.180+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn3","msg":"Slow query","attr":{"type":"command","ns":"TaskManagement.task","appName":"MongoDB Shell","command":{"aggregate":"task","pipeline":[{"$match":{"$or":[{"client_time_utc":{"$lt":{"$date":"2020-08-17T19:20:30.450Z"}}},{"$and":[{"client_time_utc":{"$date":"2020-08-17T19:20:30.450Z"}},{"_id":{"$lt":"1beecc71-c550-4ff8-8644-891e61e4e8a2"}}]}],"org_id":"2323"}},{"$facet":{"count":[{"$count":"value"}],"data":[{"$sort":{"client_time_utc":-1.0,"_id":-1.0}},{"$limit":5.0}]}}],"explain":true,"cursor":{},"lsid":{"id":{"$uuid":"3ec30c03-1de1-47aa-b912-d47878c32e88"}},"$db":"TaskManagement"},"planSummary":"IXSCAN { _id: 1 }, IXSCAN { client_time_utc: -1, _id: -1 }","numYields":0,"reslen":2714,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":1}},"ReplicationStateTransition":{"acquireCount":{"w":2}},"Global":{"acquireCount":{"r":2}},"Database":{"acquireCount":{"r":2}},"Collection":{"acquireCount":{"r":2}},"Mutex":{"acquireCount":{"r":2}}},"storage":{},"protocol":"op_msg","durationMillis":1}}

I’m not sure what to make use of rejectedPlan (explain output). Does that mean I have to modify my query?
How do I know which one is faster for count and then find vs aggregate ?

Thank you

Prasad_Saya · February 20, 2021, 10:31am

Hello @Ariel_Ariel, I will try to answer your questions.

(1)

There’s a lot of slow query message on my mongodb log, does that certainly mean my commands are slow? I have applied index and it shows IXSCAN.

This is about the aggregate query (the 2nd option):

The indexes are applied alright, probably not to the best possible extent. The $match and $sort within the $facet don’t use the indexes (that is the behaviour of $facet stage - see note below). But, you can try applying the $sort stage from the $facet stage’s data pipeline to before the $facet stage (immediately after the first $match stage) - {"$sort": {"client_time_utc":-1.0,"_id":-1.0}}. This sort operation will utilize the index defined on the { client_time_utc: -1, _id: -1 }.

This will likely benefit the query performance.

See:

Pipeline Operators and Indexes

The $match stage can use an index to filter documents if it occurs at the beginning of a pipeline.

The $sort stage can use an index as long as it is not preceded by a $project, $unwind, or $group stage.

$facet and Index Use

The $facet stage, and its sub-pipelines, cannot make use of indexes, even if its sub-pipelines use $match or if $facet is the first stage in the pipeline. The $facet stage will always perform a COLLSCAN during execution.

(2)

I’m not sure what to make use of rejectedPlan (explain output). Does that mean I have to modify my query?

It is normal to have a rejectedPlan sub-document within the explain output. It only means, that the query optimizer generated multiple plans and one of the plans was used (winningPlan) and the other was the rejected one. Sometimes the rejected plans are empty documents. See explain.queryPlanner.rejectedPlans.

(3)

How do I know which one is faster for count and then find vs aggregate ?

I think, the two queries (options) are not comparable, in terms of which is faster. Because, the aggregate runs as a single query and the find+count as two separate queries. Is it better to run two queries, rather than one? It is mostly operational related issue, and how it fits into your operations.

You can run db.collection.explain() with the the "allPlansExecution" verbosity mode, for the find+count methods (individually) and then the aggregate query.

The explain returns the queryPlanner and executionStats information for the evaluated method. The executionStats includes the completed query execution information for the winning plan. See Explain Results - executionStats for the output info which includes the "executionStats.executionTimeMillis".

Ariel_Ariel · February 20, 2021, 11:42am

thanks a lot, can you plz help answer my another qn Count and filter pipelines using mongodb go driver aggregate

Prasad_Saya · February 20, 2021, 1:32pm

Hello @Ariel_Ariel . Is the aggregation in the linked post same or similar to the aggregation query in this post?

Ariel_Ariel · February 20, 2021, 1:48pm

no, i have changed it. i want the count before i filter the query for the 2nd time but idk how