When working with movies DB, to find the minimum imdb votes, I used the following which works fine.
To find the maximum imbd votes, I have used the below aggregation which does’nt work. Is there any reason?
When working with movies DB, to find the minimum imdb votes, I used the following which works fine.
To find the maximum imbd votes, I have used the below aggregation which does’nt work. Is there any reason?
The flexible schema nature of MongoDB may sometimes renders things a little bit more complex by allowing different documents to have missing fields or same field with different type.
For example, we have the following collection with numbers, strings, null and missing value for votes.
mongosh > c.find()
{ _id: 0, votes: 1 }
{ _id: 1, votes: 'foobar' }
{ _id: 2, votes: null }
{ _id: 3 }
The $min and $max accumulators are based on the sort order. In our example the sort orders are
mongosh > c.find({}).sort( { votes : 1 })
{ _id: 2, votes: null }
{ _id: 3 }
{ _id: 0, votes: 1 }
{ _id: 1, votes: 'foobar' }
mongosh > c.find({}).sort( { votes : -1 })
{ _id: 1, votes: 'foobar' }
{ _id: 0, votes: 1 }
{ _id: 2, votes: null }
{ _id: 3 }
If you look at the $min and $max documentation you will read:
$max
Returns the maximum value.
$max
compares both value and type, using the specified BSON comparison order for values of different types.
It is also mentioned
If some, but not all , documents for the
$max
operation have either anull
value for the field or are missing the field, the$max
operator only considers the non-null and the non-missing values for the field.If all documents for the
$max
operation havenull
value for the field or are missing the field, the$max
operator returnsnull
for the maximum value.
By going back to the example, we have 2 non-null and non-missing values for votes, that is _id:0 with votes:1 and _id:1 with votes:‘foobar’. If we $group $min $max we get
mongosh > c.aggregate( { "$group" : {
"_id" : null ,
"max" : { "$max" : "$votes" } ,
"min" : { "$min" : "$votes" }
} } )
// we get, I have used foobar as the string value because it is easier to understand
// what is happening compared to the empty string that you get with $imdb.votes
{ _id: null, max: 'foobar', min: 1 }
The extra flexibility comes at the cost that sometimes you need an extra step to weed out data that does not make sense for the use case. In the $max:$imdb.votes use-case, strings do not make sense.