I have a MongoDB performance problem that I’m having trouble wrapping my head around. While I have plenty of experience with SQL tuning, Mongo tuning is new to me. A lot of the concepts translate, including the approach to troubleshooting, so hopefully my approach to debug this makes sense and some of you with more experience than me might know what’s going on.
My application that exposes this data is meant to be a search tool for the end user. It is too early to know which attributes will be most searched and in what combination, as I don’t have any users yet. The end user will also have the ability to sort by 1 attribute for a search.
To provide cursor style pagination, I believe I must also have a secondary sort on “_id” to keep the sort “stable”. Users have the ability to “load more” in the UI, which will make use of a cursor that provides the sort value.
Collection"myCollection" - approximately 500k documents
Fields
Filter Fields
“field1” - optionally present in document, only filter on existence
“field2” - optionally present in document, only filter on existence
“field3” - enum of 5 strings, filter using $in
“field4” - number, only filter using $lte
“field5” - number, filter using range ($lte, $gte)
“field6” - number, filter using range ($lte, $gte)
“field7” - number, filter using range ($lte, $gte)
“field8” - number, only filter using $lte
Sort Fields
“_id” - default sort
“field3” - same as field in filtering, enum of 5 strings
“field5” - same as field in filtering
“field6” - same as field in filtering
“field7” - same as field in filtering
“field9” - number"field10" - enum of 4 strings
In the future there may be other fields, but the short of it is: lots of fields to filter on and one of potentially several fields to sort on.
My understanding on how to index this effectively is
- use a compound index for each combination that is expected to be common used in filter and sorting
- use single indexes or a wildcard index to cover less common cases
However, in testing a compound index, I’m noticing the performance is not any better. Details below.
Let’s assume the user will search on
{
"field1": {
"$exists": true
},
"field4": {
"$lte": 100000
},
"field3": {
"$in": [
"val1",
"val2",
"val3",
"val5"
]
},
"field2": {
"$exists": true
}
}
and that the user enters no sort so the system will sort by “_id” by default for predictable ordering
db.myCollection.createIndex({ field1: 1, "field4": 1, "field3": 1, "field2": 1, "_id": 1 })
Sort should go last, right?
I also created single attribute indexes for these 4 fields and the “_id” is automatically indexed
db.myCollection.createIndex({ field1: 1})
db.myCollection.createIndex({ field2: 1})
db.myCollection.createIndex({ field3: 1})
db.myCollection.createIndex({ field4: 1})
However, when running the following query, Mongo doesn’t make use of this index and takes between 5-15s to complete. If I remove the sort, Mongo makes use of the single field (field1) index and takes less than 200ms to complete.
db.myCollection.find(
{"field1":{"$exists":true},"field4":{"$lte":100000},"field3":{"$in":["val1","val2","val3","val5"]},"field2":{"$exists":true}}
).sort({'_id': 1}).limit(100).explain('executionStats')
Before I add more specifics, like the output of the “explain” above, please let me know if there is something obvious that an engineer with very little Mongo experience would not know about. I’ve tried my best to experiment with different indexes, to review all the official docs on this subject and have searched around for recommendations/explanations, and this is still stumping me.