I dont want to read the entire document, so I am using $project to take only required fields.
Before I go to the other stages I want to filter out the docs which have { field2 : { $gt : 0 } }.
field2 is not indexed. So should I put the above condition in the first $match stage or should I have a separate $match after $project.
Will mongodb have to read the entire doc from the disk if I put it in the first stage? Or will mongodb do that anyways and filter out the fields after $project?
First if you have a set of “and” expressions that could be placed in first stage you should do that to pass as minimum data to the next stage as possible. Now if this query is often reoccurring why not to have a compound index on all 3 fields? This will speedup the query regardless of doc access or not.
Additionally, the order of the fields in the index matter. We call it Equility Sort and Range order.
So in your case n optimal index will be {field1: 1, ts : 1, field2 : 1}.
Additionally if an index can cover all return data it can be a covered query and speed performance so having all fields can help avoid document scans.
The ts field has highest cardinality for me. That used along with field1 narrows down number of docs to about 1450.
I have such multiple queries, each with different field in place of field2. So I cannot add compound index for all of them.
I am fine with the extra time it might need not being in index.
What I mainly want to know is, where should I add the {$match:{field2:{$gt:0}}}?
Will merging it with first stage cause the entire document to be read for the filtering process or will mongodb just read that single field?
If it reads the entire document from disk for filtering, then I feel it would be better to put the {$match:{field2:{$gt:0}}} after $project stage.
The optimisation guide states that all possible filtering needs to be done in earliest stage possible.
MongoDB can’t read single fields if they are not indexed. All documents of the first stage are read into memory and passed to the the next stage in memory (excluding sort stage which can use index) .
The cardinality within an index can play a smaller role if the index cannot support the sort. Having ts as the first field will result in a blocking in memory sort.
You can index your documents how you want of course but I would recommend { field1: 1, ts : 1}
Also, with this index, I didnt have to create a separate index on field1, query explain showed mongodb was reusing the existing index with min and max value on ts.
Sometimes an index will be chosen but its performance will be suboptimal.
Running from min to max is usually undesirable as it means you are doing a full index scan where your purpose was to use index as a filter to reduce amount of docs to be accessed.
A good metrics to look into is numKeysScanned or numDocsScanned vs nreturned.
@Pavel_Duchovny I was trying to keep as less indexes as possible.
My initial indexes were { field1: 1, ts:1 } and { ts:1 }
When I read about cardinality it made a lot of sense to put { ts:1, field1: 1 } instead of { field1: 1, ts:1 }.
My queries are mainly ts range and specific string id of source in field1 OR just ts range. So I figured just { ts:1, field1: 1 } index will do for both.
But now it looks like I should rethink.
I know this is off topic from this thread, but if I go back to { field1: 1, ts:1 } do you think I should add { ts:1 } as well? Or should I first look at the metrics and then decide?