How to use the maximum should clause score

Xavier_Carlson · April 24, 2023, 8:56pm

In the documentation for the compound operator, it reads that the scores for each matching clause in a should statement are summed together:

The returned score is the sum of the scores of all the subqueries in the clause.

Is there a way I can customize the scoring so that the maximum subquery score is used? I’m using the should operator to search through multiple fields (exactly 4) in a single query. I’m trying to customize the scoring so that documents matching on multiple fields don’t dominate over documents matching on fewer fields when the results are ranked. My should clause uses a mix of autocomplete and text operators.

This is an imaginary example similar to what I have:

[
  {
    "$search": {
      "compound": {
        "should": [
          {
            "text": {
              "query": "Honeycrisp",
              "path": "name"
            }
          }, {
            "autocomplete": {
              "query": "Honeycrisp",
              "path": "description"
            }
          }
        ]
      }
    }
  }
]

So I want records matching at least one of these fields but I want to limit the relevance score to the maximum score of the subqueries.

Jason_Tran · April 26, 2023, 4:14am

Hi @Xavier_Carlson - Welcome to the community

I’m trying to get a better understanding of what you’re wanting to achieve so the following is my interpretation of your post details but please correct me if I am incorrect on any of these:

The actual documents being returned are correct / as you expect
The score is higher for documents that have multiple fields matching the clauses - This is not what you are after
You wish to have documents score the same regardless if they match 1 field or 4 fields (for example).

In short, it seems you are only after score customisation and the documents themselves being returned are correct?

I’m wondering if it’s also possible if you can provide the output you are currently getting to help clarify what you are seeing so that I understand the scenario a bit better.

Look forward to hearing from you.

Regards,
Jason

Xavier_Carlson · May 1, 2023, 4:28pm

Hi @Jason_Tran,

Thanks for your reply. Your interpretation is spot on. I cannot provide the output but I can say the search I’m trying to do is an all-in-one / omni-search through all the text fields in my collection using the should operator. The results are correct but my concern is with the ranking of the results. For instance, some records may lack a “description” but others may have both “name” and “description” and share many of the same terms. So yes I am trying to go for “3.” to account for this.

Jason_Tran · May 2, 2023, 12:40am

Thanks for confirming @Xavier_Carlson,

Will using constant for the scoring option in the compound operator work for you? I have an example below which contains 3 documents:

With 1 field containing "Honeycrisp"
With 2 fields containing "Honeycrisp"
With 3 fields containing "Honeycrisp"

Test data:

DB>db.search.find({},{_id:0})
[
  { name: 'Honeycrisp', description: 'Honeycrisp' },
  { name: 'Honeycrisp', description: 'Nothing' },
  {
    name: 'Honeycrisp',
    description: 'Honeycrisp',
    thirdfield: 'Honeycrisp'
  }
]

Based off what we discussed and using a similar pipeline to the one you provided, I assume you want these 3 documents returned but with the same score. Does the following possibly suited to your use case:

[
  {
    "$search": {
      "compound": {
        "should": [
          {
            "text": {
              "query": "Honeycrisp",
              "path": "name"
            }
          }, {
            "autocomplete": {
              "query": "Honeycrisp",
              "path": "description"
            }
          }
        ],
        "score": { "constant": {"value": 1 } }
      }
    }
  },
  {
    "$project": {
      "_id": 0,
      "name": 1,
      "description": 1,
      "thirdfield": 1,
      "score": {"$meta": "searchScore"}
    }
  }
]

Output (All same score):

[
  { name: 'Honeycrisp', description: 'Honeycrisp', score: 1 },
  { name: 'Honeycrisp', description: 'Nothing', score: 1 },
  {
    name: 'Honeycrisp',
    description: 'Honeycrisp',
    thirdfield: 'Honeycrisp',
    score: 1
  }
]

If you think this may work for you, please go ahead and test it out on a test environment / larger data set to ensure it meets all your requirements - I have only tested this on 3 sample documents on my test environment.

Look forward to hearing from you.

Regards,
Jason