Exists query with index very slow

kevinadi · June 11, 2020, 6:08am

Hi Karl,

I see what you mean. Well, I have a “hack” that may work with your specific use case. You might be able to use sparse indexes to achieve a quick count. Using the example you provided, I created a sparse index on consumer.consumed:

> db.test.find()
{ "_id" : 0, "consumer" : { "name" : "karl", "consumed" : "2019-10-10" } }
{ "_id" : 1, "consumer" : { "name" : "kevin", "consumed" : null } }
{ "_id" : 2, "consumer" : { "name" : "iceman" } }

> db.test.createIndex({'consumer.consumed':1}, {sparse:true})

A feature of a sparse index is that it doesn’t create an index key for documents that doesn’t have the indicated field. This can be verified by doing a find() by hint():

> db.test.find().hint({'consumer.consumed':1})
{ "_id" : 1, "consumer" : { "name" : "kevin", "consumed" : null } }
{ "_id" : 0, "consumer" : { "name" : "karl", "consumed" : "2019-10-10" } }

Note that the third document is missing here.

For most other queries, MongoDB knows that the sparse index does not cover the whole collection, and would avoid using it if it thinks that it can return the wrong result. That is, unless you force it to use the index by hint(), which can work to your advantage in this specific case.

Since the sparse index doesn’t include the 3rd document, for your count query to return the correct count, you just have to hint() it, and provide an empty parameter for find(), since that {$exists:true} parameter is already implicit in the index itself:

> db.test.find().hint({'consumer.consumed':1}).count()
2

and since you’re forcing it to use that index, it doesn’t load the documents from disk:

> db.test.explain('executionStats').find().hint({'consumer.consumed':1}).count()
...
		"totalKeysExamined" : 3,
		"totalDocsExamined" : 0,
...

Please have a read though Sparse Indexes and its superset Partial Indexes (which is a more flexible version of sparse indexes) for more details.

However, a caveat worth repeating is that since sparse indexes don’t cover the whole collection, other queries may behave differently, e.g. some queries that should use the index may not use the index and end up being a collection scan instead. This is explained in the linked pages above.

Best regards,
Kevin