I have a document structure with the default _id and three other elements. I need to determine an IDs set of the data. The query I use for this includes all document fields (except the _id) . I made the following observation:
If you create an compound index from all fields (incl. _id!) to aggregate the result to a list of _ids you save the FETCH phase and get a PROJECTION_COVERED phase (in comparison to an index without the _id). This seems to be a good swap.
But for me it feels weird to put an index on all fields. Is this a smell?
I don’t think it’s a smell because you are trying to convert a regular query into a covered query by creating an index that will satisfy the query and the fields returned. This is efficient because the database can get all the work done by just doing an index lookup. Also, you are well within the limit of number of indexed fields in a compound index. So, not an issue on that front. However, the tradeoffs to be aware of here would be the index size and growth, and the index updates during insert, update, and delete operations which will have an impact on compute services.
Another way to approach this would be to explicitly exclude _id if it’s not needed - db.coll.find({}, { a: 1, b: 1, c: 1, _id: 0 }). It will save you from creating a compound index with _id field.
You might be questioning this because documents only have four fields and now your index also has four fields. But with flexible schema the database doesn’t know that the fields in the index are the only fields present in the documents. So while it may be true now as your application evolves and you might start adding other fields to the documents, the index will still only have these four fields.
Now, adding the _id into the index made your query faster but it’s already faster than collection scan because the index can more quickly let you find a small subset of documents that satisfy your criteria. So adding _id because you need to return it is just fine. Also, indexes don’t store field names in them, and they utilize prefix compression so your index is likely much smaller than your collection even if it seems like the information in the index is the same as the information in the collection.