How to sort documents based on number of filters matched in $or?

How to sort documents based on number of filters matched in $or?

let query = { $or: [filter1, filter2, filter3] }

return documents should be sort based on score, number of filters matched

if document 1 match only filter1 it should have score 1,
if document 2 match only filter1 and filter2 it should have score 2,
if document 3 match only filter1, filter2, and filter3 it should have score 3

Hello @maddy, welcome to the MongoDB Community forum!

You can write an Aggregation query using the $facet stage - each facet matching a filter. The output of the facet stage will result in documents matched from each facet filter which can be further processed to get the desired result.

Assuming there are documents like this:

{ _id: 1, a: 1, b: 5, c: 3 }

And the three filters are like this:

a > 5
b == 5
c < 10

Then the query would be:

db.collection.aggregate([
{ 
    $facet: {
        filter1: [ { $match: { a: { $gt: 5 } } } ],
        filter2: [ { $match: { b: { $eq: 5 } } } ],
        filter3: [ { $match: { c: { $lt: 10 } } } ]
    }
},
{ 
    $project: { 
        result: { $concatArrays: [ "$filter1", "$filter2", "$filter3" ] } 
    } 
},
{ 
    $unwind: "$result" 
},
{ 
    $group: { 
        _id: "$result._id", 
        count: { "$sum": 1 } 
    } 
},
{ 
    $sort: { count: 1 } 
}
])

Finally, you can use the $sortByCount instead of the last two stages together, to get the same result.

1 Like

Despite having a good answer from @Prasad_Saya, as usual (Most of the time I learn something when he posts), I wanted to give a shot of my own. I am afraid that the $facet and $concatArrays might require a lot of memory.

> q_a = { "$gt": [ "$a" , 5 ] }  // a > 5
> q_b = { "$eq": [ "$b" , 5 ] } // b == 5
> q_c = { "$lt": [ "$c" , 10 ] } // c < 10
// Addings fields score_a/_b/_c in such a way
// that score_x will be 1 of q_x is true and 0 otherwise
> addFields_scores = { '$addFields': 
     { "score_a": { '$cond': [ q_a, 1, 0 ] },
       "score_b": { '$cond': [ q_b, 1, 0 ] },
       "score_c": { '$cond': [ q_c, 1, 0 ] } } }
> addFields_count = { "$addFields" :
      { "count" : { "$add" : [ "$score_a" , "$score_b" , "$score_c" ] } } }
> sort = { "$sort" : { count : 1 } }
> pipeline = [ addFields_scores , addFields_count , sort_count ]
> db.collection.aggregate( pipeline )

The above is the version as I write it, with variables, so I have shorter line to edit. The following is the actual pipeline:

[ { '$addFields': 
     { score_a: { '$cond': [ { '$gt': [ '$a', 5 ] }, 1, 0 ] },
       score_b: { '$cond': [ { '$eq': [ '$b', 5 ] }, 1, 0 ] },
       score_c: { '$cond': [ { '$lt': [ '$c', 10 ] }, 1, 0 ] } } },
  { '$addFields': { count: { '$add': [ '$score_a', '$score_b', '$score_c' ] } } },
  { '$sort': { count: 1 } } ]
2 Likes

Thanks @steevej, also i wanted to sort documents based on number of items matched in $in,

let query = {
  $or: [{ array1: { $in: [1, 2, 3, 4, 5] } }, { array2: { $in: [6, 7, 8, 9, 10] } }],
};

let document1 = { array1: [1,2], array2: [6]} // so count would be 3
let document2 = { array1: [1], array2: [6,7,8]} // so count would be 4
let document3 = { array1: [4,5], array2: []} // so count would be 2
let document4 = { array1: [1,2,3], array2: [6,7,8]} // so count would be 6

You need

and

1 Like

First of all, no matter which solution you use, you must make the $match stage for your filter first. Otherwise you are doing massive unnecessary analysis of documents that won’t match any part of your filter.

Having said that, you have now added two sort criteria, first sort by documents matching the most $or clauses, but then you added that if the $or clause is $in then you want sort based on how many items inside $in were matched…

Combining all the suggestions with proper filtering, you would do something like this:

let filter1 = { array1: { $in: [1, 2, 3, 4, 5] } };
let filter2 = { array2: { $in: [6, 7, 8, 9, 10] } };
let filter3 = { array3: {$in: ["a", "b", "c"]} };
let query = { $or: [filter1, filter2, filter3] }

[ 
   {$match: query},
   {$set:{sortScore:{$sum:[ 
          {$size:{$setIntersection:[ {$ifNull:["$array1",[]]}, [1, 2, 3, 4, 5]]}}, 
          {$size:{$setIntersection:[ {$ifNull:["$array2",[]]}, [6, 7, 8, 9, 10]]}}, 
          {$size:{$setIntersection:[ {$ifNull:["$array3",[]]}, ["a", "b", "c"]]}}
   ]}}},
   {$sort:{ sortScore: -1 }}
]

I corrected a few other minor errors - you want to sort descending to get highest matches first, etc.
You can also add {$project:{sortScore:0}} at the end to remove the score you sorted by.

By the way, are the $or clauses always $in with scalar values? Or do you need a more generalized solution? This one will only work assuming they are always arrays and filters are $in clauses.

Asya

3 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.