Hello, I have an interesting problem:
I built a query that in aggregation $match stage gets a list of “someIDs” and I try to match data by it. What happens is that when I run the query first time, it takes about 90 seconds but then the second run takes about 5 seconds (and that’s using indexes). I assume it has something to do with query plans being cached.
I have tried a couple indexes and by eliminatig other stages to leaving just a $match stage, I determined that it is the slowest (or query plan selection caching is slow).
Example:
db.collection.aggregate([
{
$match: {
searchedID: {
$in: [“someID1”, “someID2”,…], // may contain up to several thousand
},
},
},
{
$sort: {
searchedID: -1,
otherField: 1,
},
},
{
$group: {
_id: ‘$searchedID’,
required_array: {
$push: {
field1: ‘$someField1’,
field2: ‘$someField2’,
field3: ‘$otherField’,
},
},
},
},
])
Indexes I tried using:
{ searchedID: -1 }
{ searchedID: -1, otherField: 1 }
- By eliminating $group and $sort stages I determined that these do not cause trouble. When running the query a second time, it goes fast.
- Collection contains 7 million documents, from which I only need to retreive a couple thousand.
- Tried using $or operator instead of $in - similar results.
Questions:
- If it’s possible, how can I speed up query plan selection?
- Is the $in operator the troublesome place? (When it sometimes has a list of a couple thousand strings)
- Is it possible to speed up the initial/first query? (I actually need the first one to be fast)
- Any other suggestions on how I could increase speed of my query?
- Is it possible to speed up my $match stage even more?
Would very much appreciate any help from you guys!