After adding the index, null values appeared in the result set, causing the application processing logic to change. I would like to ask if this is a bug.
This is actually not a bug. You need to create a unique index so that null values are not included in your index. The only unique field by default is _id.
While a unique index would work to ensure null are not in the distinct() result set, they would prevent adding documents with the same id. In the sample documents supplied there is 2 documents with id:2 so creating a unique index would failed with E11000-duplicate-key-error.
Thanks Reply,
db.t1.createIndex({id:1},{partialFilterExpression:{id:{$exists:true}}})
This will indeed avoid null included in the result, but it also brings new problems.
When I query for id:null , the index cannot be used.
db.t1.explain().find({id:null})
I personally think that the result set is different just because of the addition of the index. This should be considered a bug。
Is id:null a value you want to find or not? If it is, then do not use the partial index and accept the fact that null is a distinct value. If you do not accept the fact that null is a distinct value then you should not try to find( {id:null} ).
Any way, nothing stops you from having another index with {partialFIlterExpression:{id:null}}. So there is a solution.
Any way for me, the following should be true:
looking for all the distinct value should return me all the documents, so I expect null to be there
Now that you know how it works, make it work for you. I am quite happy with the way it works. I design using the specs I am given.
This appears to be a behavioral difference between the how null values are returned to the distinct command when a COLLSCAN plan is used vs. when an IXSCAN plan is used.
I’ve filed SERVER-85298 to follow up on this further.
I believe the issue here is the usage of the distinct command; not whether or not there are other ways to surface a list of distinct values (though it’s good to showcase the power/utility of aggregation here).