Hi Keren - Thanks for providing those requested details.
since attributes is the array field that causes the index to be multikey, at least my first type of query (the one that doesn’t filter by attributes) should have still be covered? (I realized looking more closely at the explain output it indeed isn’t, but wondering why, considering that part of the documentation?)
I did some testing with MongoDB version 4.0 and it seems even if the projection is covered, the stage from the execution stats output will still show as 'PROJECTION'
so my apologies there for any confusion caused (My test environment using version 5.0.13 displays the particular stage as PROJECTION_COVERED
). However, in saying so, you can possibly try inspecting the totalDocsExamined
value for your queries to determine if the query is covered or not. If the totalDocsExamined
value is 0, then the query was most likely covered (there is an exception for when the query returns no results). This can be seen with the "no-attributes-index"
output for the query that does not contain "attributes"
:
db.getCollection('COLL').find({name: { $regex: "one" }}, { field_a: 1, field_b: 1, _id: 1 })
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 2048,
"executionTimeMillis" : 6023,
"totalKeysExamined" : 5118976,
"totalDocsExamined" : 0
Compared with db.getCollection('COLL').find({name: { $regex: "one" }, attributes: "a"}, { field_a: 1, field_b: 1, _id: 1 }).sort({name: 1})
which isn’t covered
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 2048,
"executionTimeMillis" : 18527,
"totalKeysExamined" : 5118976,
"totalDocsExamined" : 2048
the queries (both types of queries) take ~100sec with the full index, which is a multikey index, and apparently not covered - perhaps my misunderstanding of the documentation there
If we inspect the execution stats output for full index output for both queries, we can see that the server scanned 5.1M index keys and also 5.1M documents. Interestingly enough, this number sounds like the whole collection and if this is the case, my guess is that performing a collection scan may be faster in this particular scenario if the same 5.1M are needed to be scanned (without needing to inspect any index keys).:
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 2048,
"executionTimeMillis" : 98361,
"totalKeysExamined" : 5118976,
"totalDocsExamined" : 5118976
both queries take ~6 seconds with the no-attributes-index, despite it needing to fetch attributes for the query that filters according to it
both queries take ~4 seconds with the only-name index, despite it needing to fetch all the projection fields
I do see why this would appear quite strange as inspecting the execution stats you can see the query / index combinations requiring a fetch are slightly faster than the covered query. The query selectivity drastically impacts the performance. For your reference, on a test environment with 1M documents, I compared 2 $regex
queries, one with an anchor:
db.collection.find({'name':{$regex:"Est"}},{"name": 1, "field_a": 1, "field_b": 1, "_id": 1}).sort({"name":1}).explain("executionStats")
:
executionStats: {
executionSuccess: true,
nReturned: 8090,
executionTimeMillis: 994,
totalKeysExamined: 1000000,
totalDocsExamined: 0,
executionStages: {
stage: 'PROJECTION_COVERED'
(with anchor)
db.collection.find({'name':{$regex:"^Est"}},{"name": 1, "field_a": 1, "field_b": 1, "_id": 1}).sort({"name":1}).explain("executionStats")
:
executionStats: {
executionSuccess: true,
nReturned: 6061,
executionTimeMillis: 12,
totalKeysExamined: 6062,
totalDocsExamined: 0,
executionStages: {
stage: 'PROJECTION_COVERED'
Note the executionTimeMillis
difference between an anchored an non-anchored regex queries. Also the number of totalKeysExamined
, where the non-anchored regex query is scanning the whole keyspace: 1000000 scanned vs 8000 returned, an average of 0.008 document returned per index key scanned, and the anchored query is scanning 6062 keys and returns 6061 documents, almost 1:1 ratio, which means that the server does not do unnecessary work.
On an additional test with anchors, I found that a full index was faster than the single field index.
Although I do understand you have stated your particular use case requires the full expression to be searched. If this is a frequent operation, you may wish to consider using Atlas Search (although I presume you are on-prem due to the MongoDB version stated ).
Regards,