Hi,
I made an aggregation pipeline including a “search” stage on several fields of a collection containing clients information (name, first name, street name, locality, …).
In the “project” stage at the end of the pipeline, I include the “highlights” metadata:
{
"$project": {
"_id": 0,
"object_id": "$customerId",
"object_infos": {
"$concat": [
{"$toString": "$customerId"},
" - ",
"$firstName",
" - ",
"$name",
" - ",
"$streetName",
" - ",
"$locality"
]
},
"score": { "$meta": "searchScore"},
"highlights": { "$meta": "searchHighlights" }
}
During my tests, I provided two strings in input (“Becker” and “Ketangi”), so that I have some results for which there’s a match on the name (at least partially), and the locality:
{
"object_id": 750445,
"object_infos": "750445 - Madelena - O'Connell and Becker - Clarendon Street - Ketangi",
"score": 25.159177780151367,
"highlights": [
{
"score": 6.748023509979248,
"path": "name",
"texts": [
{
"value": "O'Connell and ",
"type": "text"
},
{
"value": "Becker",
"type": "hit"
}
]
},
{
"score": 7.059268474578857,
"path": "locality",
"texts": [
{
"value": "Ketangi",
"type": "hit"
}
]
}
]
},
The highlights metadata currently provide a lot of information that I actually don’t need. I would like to “reduce” them only to the values for which there was a hit. So for my example above, I would like to have something like this:
{
"object_id": 750445,
"object_infos": "750445 - Madelena - O'Connell and Becker - Clarendon Street - Ketangi",
"score": 25.159177780151367,
"highlights": ["Becker", "Ketangi"]
},
The goal of this is to help to identify on which value there was a hit, in the case where that value doesn’t correspond exactly to my input string (in case of fuzzy search).
Removing the “highlights.score” and “highlights.path” is easy (by just adding a “project” stage and setting to fields to 0). However in my example, I still need to do two more steps, but so far I didn’t find a way to do it:
- remove the complete “highlights.texts” entry for which the type is “text”
- remove the “type” field in the 2 remaining "“highlights.texts” entries
- merge the 2 remaining “highlights.texts.value” in a single array (this step could be optional if too complex to do)
Adding an “unwind” stage to split the array of highlights is not an option, as I want to keep everything in one single document. I already tried to use the conditional removal, like this :
{
"$project" : {
"object_id": 1,
"object_infos": 1,
"highlightsNEW": {
"$cond": {
"if": { "$eq": [ "$highlights.texts.type", "text" ] },
"then": "$$REMOVE",
"else": "$highlights.texts.value"
}
}
}
}
… but it doens’t work. Here’s the result that I have:
{
"object_id": 750445,
"object_infos": "750445 - Madelena - O'Connell and Becker - Clarendon Street - Ketangi",
"highlightsNEW": [
[
"White, O'Connell and ",
"Becker"
],
[
"Ketangi"
]
]
},
Would someone have an idea about how I could do that ?