Run aggregation pipeline and sort results by a set of match priorities

Hello,
I am trying to search and filter names in a collection, and am using an aggregation pipeline to do this. However, I would like the results to be ordered according to a set of match priorities:

  • exact matches on any name
  • matches at the beginning of a name
  • matches anywhere else within a name

For example, if I search for “ron”, I would like the results to be ordered as follows:

  • Ron Alvarez
  • David Ronald
  • Ronald King
  • Aaron Franks
  • Billie Strong
  • Byron Garretson
  • Michael Baron

Is there a clever way to do this using a single aggregation pipeline, or would I need to run multiple aggregation pipelines and then combine the results?

Thanks.

Hi @George_Kamel ,

This sounds exactly how atlas search full text scoring is working.

I believe the regex operator is what you looking into:

Have you tried using atlas and atlas search?

Using plain aggregation would need to go through complex logic to operate the way a search engine does…

Thanks
Pavel

As a challenge, as a learning experience and for people that cannot use Atlas search, I am trying to come up with something.

Yes it

but the concept should work.

pipeline = []

/* first stage of the pipeline is a simple regex match for ron anywhere */
_match = { "$match" : {
    "name" : { "$regex" : "ron" , "$options" : "i" }
} }

pipeline.push( _match )

/* then the magic stage, a $set that uses $cond to set 3 _sort_priorities for the 3 conditions.
   for demontration purpose I will use a simple $cond for the /^Ron / case */

_sort_priorities = { "$set" : {
    "_sort_priorities : {  "$cond" : [
        {  '$regexMatch': { input: '$name', regex: '^Ron ', options: 'i' } } ,
       0 
       1 , /* for other case we need another $cond for other cases */
    ] }
} }

pipeline.push( _sort_priorities )

/* then the final $sort  that uses _sort_priotity */
_sort = { "$sort" : {
    "_sort_priority" : 1 ,
    name : 1 
} }

pipeline.push( _sort )

The difficulty lies in the complex $cond that sets the appropriate _sort_priority. Much simpler with Atlas search but doable otherwise.

1 Like

Thank you @Pavel_Duchovny and @steevej for your replies.

I wasn’t familiar with MongoDB Atlas before, so @Pavel_Duchovny thank you for pointing me in the right direction!

@steevej, thank you for you elegant solution - it works brilliantly and have extended it for more complex sorting priorities. It will serve as an excellent interim solution until we are in a position to use MongoDB Atlas.

2 Likes

After implementing the solution in Go (using the latest version of the official MongoDB driver), it seems the driver does not recognise $regexMatch within $cond (it works perfectly fine in MongoDB Compass). As a workaround, I’m just wondering how I could assign the $regexMatch result to a variable outside of $cond, and use $eq in $cond to do the check?

Hi @George_Kamel ,

I think latest compass aggregation tab have an “Export to Language” for Go. Also maybe the beta has also advanced language syntax :slight_smile:

It recommends me to use:

bson.D{
                        {"$regexMatch",
                            bson.D{
                                {"input", "$name"},
                                {"regex", primitive.Regex{Pattern: "ron"}},
                                {"options", "i"},
                            },
                        },
                    },

Have you tried the compass syntax?

Thanks
Pavel

Hi @Pavel_Duchovny,
Yes, all the code I used was based on the “Export to Language” feature, but the issue is that neither $regexMatch nor $regex is being recognised by the mongo driver inside $cond. This is the code of the specific pipeline stage with the issue:

bson.D{
	{
		Key: "$addFields",
		Value: bson.D{
			bson.E{
				Key: "match_whole_name",
				Value: bson.D{
					{
						Key: "$cond",
						Value: bson.M{
							"if": bson.D{
								{
									Key: "$regexMatch",
									Value: bson.D{
										{Key: "input", Value: "$name"},
										{Key: "regex", Value: primitive.Regex{Pattern: "\bRonald\b"}},
										{Key: "options", Value: "i"},
									},
								},
							},
							"then": 0,
							"else": 1,
						},
					},
				},
			},
		},
	},
}

I raised a separate thread here about the regexMatch not recognised issue - it seems to be a bug / lack of support in the driver? If so, then I was wondering if I could use some workaround by setting a variable outside $cond and checking this with $eq inside the $cond?

Thanks,
George

@George_Kamel

There is a new $let stage:

1 Like

That is very surprising because in principal pipelines and queries are sent over and and run by the server. Yes, in principal, the driver in some occasion modify the query. For example, the short-cut query

{ a : 1 , b : 2 }

might be sent as

{ '$and': [ { a: { '$eq': 1 } }, { b: { '$eq': 2 } } ] }

But it is still possible that the driver has something to do with it.

That is the beauty of the concept of a pipeline, you may add stages to simplify followings. I often do that as it also helps debugging as you may set fields that are kept so you may find where things fails. So yes you may try to evaluated $regexMatch in a separate stage. Since I am not very go fluent, try this js version:

{ "$set" : {
    "_regex_match" : {
        "input" : "$name" ,
        "regex" : "\\bRonald\\b ,
        "options" : "i"
    }
} }

Then in your $cond of the next stage you simply use $_regex_match as the expression.

In case, you did not notice, I used 2 backslashes as it is needed in Java and JS. I do not know about go.