First query very slow, second - fast

Hello, I have an interesting problem:
I built a query that in aggregation $match stage gets a list of “someIDs” and I try to match data by it. What happens is that when I run the query first time, it takes about 90 seconds but then the second run takes about 5 seconds (and that’s using indexes). I assume it has something to do with query plans being cached.
I have tried a couple indexes and by eliminatig other stages to leaving just a $match stage, I determined that it is the slowest (or query plan selection caching is slow).

Example:

db.collection.aggregate([
{
$match: {
searchedID: {
$in: [“someID1”, “someID2”,…], // may contain up to several thousand
},
},
},
{
$sort: {
searchedID: -1,
otherField: 1,
},
},
{
$group: {
_id: ‘$searchedID’,
required_array: {
$push: {
field1: ‘$someField1’,
field2: ‘$someField2’,
field3: ‘$otherField’,
},
},
},
},
])

Indexes I tried using:
{ searchedID: -1 }
{ searchedID: -1, otherField: 1 }

  • By eliminating $group and $sort stages I determined that these do not cause trouble. When running the query a second time, it goes fast.
  • Collection contains 7 million documents, from which I only need to retreive a couple thousand.
  • Tried using $or operator instead of $in - similar results.

Questions:

  • If it’s possible, how can I speed up query plan selection?
  • Is the $in operator the troublesome place? (When it sometimes has a list of a couple thousand strings)
  • Is it possible to speed up the initial/first query? (I actually need the first one to be fast)
  • Any other suggestions on how I could increase speed of my query?
  • Is it possible to speed up my $match stage even more?

Would very much appreciate any help from you guys!

I would suggest that you post your question to https://www.mongodb.com/community/forums// as it does not look related to lessons or labs for this course.

Some thoughts…

Empirical calculations are performed for each query you run in an attempt to build a query plan that best suits the query shape. Once the query plan is in the Active state, subsequent queries that have the same query shape will use this cached winning plan to fulfil the query hence the speedup you noticed.

So as long as the query shape of subsequent queries is the same, they will use the same query plan. And I would imagine that a change to the $in filter arguments does not constitute a change in query shape, but a substitution of fields may however be seen as a change in query shape.
Note, however that the query plan cache is lost following a server restart.

Suggestions / things to try:

  1. Use the index below because it meets all your query+sort needs:
    { searchedID: -1, otherField: 1 }

  2. Drop the sort stage because you’re already using a sorted index

  3. find() is typically more performant than aggregate() so convert your query:
    db.collection.find(<query>).group(<query>)

  4. Force the query to use a hint()… might speed up the first query but use with caution:
    db.collection.find(<query>).group(<query>).hint('index')
    db.collection.aggregate([$match, $group], hint: {'index'})

  5. Create another collection to hold two individually indexed fields, the searchID and searchName. Use a $lookup to join both collections. The collection will look something like this:

    searchName searchID
    filter1 someID1
    filter1 someID2
    filter1 someID3

    Within the from collection of the $lookup, filter by searchName: filter1 and join on searchID.

  6. Probably not the best suggestion, but if it will fit into each array, try a $facet stage, where each pipeline holds a subset of the searchID. Group within each pipeline and finally combine all facet pipelines in a $project.
    Include/exclude an index hint.

  7. Lastly, and most importantly, verify each query performance by analysing their Explain Plans

PS: Sorry @steevej-1495, I had to respond because @Julius_65872 said it was…

:wink:

2 Likes

Do not worry, I am not offended at all.

I will bookmark your answer and read it carefully when I have time.

1 Like

I forgot to add that the working set plays an important part in this too.

Thank you! Good notice on sorting. I am using hint with the index you told about. I think I will try out the findMany method since I am passing a large array as filter parameter.

Sorry, will keep In mind, just the discussion came to mind first :slight_smile:

I would actually be inclined to try out all of them. You’d be surprised what the outcome may be.
Let us know how it goes or if you need further clarity on any of the suggestions.

What king of index would you suggest using then? with find().group() methods?. Shouldn’t it be only the first field, that was in match stage?

If you don’t need (or not using) the otherField, then I’d suggest using the single searchID index.