Assign base score for MongoDB Atlas Search

Hi team,

May I ask if it is possible to assign a base score when using MongoDB Atlas Search, so that the must or should query could boost the score based on the base score?

Example documents could be:

{
  “_id”: 1,
  "baseScore": 0.5,
  "textField": "Apple"
},
{
  “_id”: 2,
  "baseScore": 0.2,
  "textField": "Banana"
}

So when I use a should query to search, it will also consider my baseScore, and boost based on that

Hi William,

I’m a bit confused regarding the below details:

  • You’ve provided the example documents (Could be)
  • These example documents already have a baseScore field. However, you state - “May I ask if it is possible to assign a base score when using MongoDB Atlas Search”

Considering the above, is the baseScore field that exists in the example documents what you want to achieve or is it a field you’re wanting to use in the actual search score calculation?

Just to clarify, would you be able to give an example search query and your expected output?

Look forward to hearing from you.

Regards,
Jason

@Jason_Tran Thank you for the quick response!!!

So the question is how to make Atlas Search use my baseScore in the actual search score calculation.

Example query could be:

    $search: {
      "compound": {
        "should": [{
          "autocomplete": {
            "path": "textField",
            "query": "b",
            "score": { "boost": { "value": 2}}
          }
        }]
      }
    }

So for the above examples, although id2 will be boosted, it will be 0.2*2, which is still less than 0.5, thus the returning order is still id1 and then id2

Let me know if the above makes sense to you

Curious what’s the use case here for combining Atlas Search’s scoring with your own custom scoring? Every document returned by an Atlas Search query is assigned a score based on relevance, and the documents included in a result set are returned in order from highest score to lowest.

From your example I believe only 1 document would be returned (I used "ba" as opposed to "b" for the query value). So return order would not match what you would have described as there would only be a single document returned.

For instance, your example query would presumably return only document with _id: 2 (basing this off the b query value you provided):

score> db.collection.find()
[
  { _id: 1, baseScore: 0.5, textField: 'Apple' },
  { _id: 2, baseScore: 0.2, textField: 'Banana' }
]

$search pipeline only returning document with _id: 2:

score> db.collection.aggregate(
[
  {
    '$search': {
      index: 'textindex',
      compound: {
        should: [
          {
            autocomplete: {
              path: 'textField',
              query: 'ba',
              score: { boost: { value: 2 } }
            }
          }
        ]
      }
    }
  },
  {
    '$project': {
      _id: 1,
      baseScore: 1,
      textField: 1,
      searchScore: { '$meta': 'searchScore' }
    }
  }
])
[
  {
    _id: 2,
    baseScore: 0.2,
    textField: 'Banana',
    searchScore: 0.9241962432861328
  }
]
score> 

Do you think use of the constant scoring option would help your use case?:

The constant option replaces the base score with a specified number.

@Jason_Tran Thank you for your detailed reply!!!

Interesting that it would only return 1 result as I thought should would only boost instead of filter, but that is a different topic that I could look into later.

The potential use case for a base score is like:
Consider a use case like LinkedIn, where there’re many profiles that may or may not be completed, e.g., missing profile photo, content looks suspicious, etc. So I would like to assign a base score so that my search sorting will be a combination of text search native ranking and how I want to boost certain profiles.

And constant scoring option would not help because my base score is irrelevant to the searching text.

1 Like

Interesting - thanks for providing the use case details William :slight_smile: Just another question: What would the base score represent here or how is it determined? I.e. What makes one base score higher than another? I assume all of this base scoring would occur before any searching as per your example documents.

yes, the score would be computed before search. I am using an offline processing flow to pre-compute a rule-based score from certain non-text factors.

Based off your description and correct me if I am wrong, the baseScore is an independent score calculated outside of Atlas search. Because of this, I don’t believe it’s possible to combine the Atlas search scoring with the baseScore field you’ve mentioned to create the custom sorting you are after without use of additional aggregation stages. I assume you wanted all of this done alone in the $search stage for performance reasons maybe.

However, does the following example perhaps look a bit closer to what you’re after? I have a possible workaround, although i’m not entirely sure if it works for your use case, which uses the near operator with an origin of 1 (Let’s say this is the baseScore we want to be closest to for sorting):

score> db.collection.aggregate(
[
  {
    '$search': {
      index: 'textindex',
      compound: {
        should: [
          {
            autocomplete: {
              path: 'textField',
              query: 'ba',
              score: { boost: { value: 2 } }
            }
          },
          { near: { path: 'baseScore', origin: 1, pivot: 0.1 } }
        ]
      }
    }
  },
  {
    '$project': {
      _id: 1,
      baseScore: 1,
      textField: 1,
      searchScore: { '$meta': 'searchScore' }
    }
  }
])

Output (which includes the document with _id: 1):

[
  {
    _id: 8,
    textField: 'Banana 7',
    baseScore: 0.7,
    searchScore: 0.5457944273948669
  },
  {
    _id: 7,
    textField: 'Banana 6',
    baseScore: 0.6,
    searchScore: 0.495794415473938
  },
  {
    _id: 6,
    textField: 'Banana 5',
    baseScore: 0.5,
    searchScore: 0.46246111392974854
  },
  {
    _id: 5,
    textField: 'Banana 4',
    baseScore: 0.4,
    searchScore: 0.43865156173706055
  },
  {
    _id: 4,
    textField: 'Banana 3',
    baseScore: 0.3,
    searchScore: 0.42079442739486694
  },
  {
    _id: 3,
    textField: 'Banana 2',
    baseScore: 0.2,
    searchScore: 0.40690553188323975
  },
  {
    _id: 2,
    baseScore: 0.2,
    textField: 'Banana',
    searchScore: 0.3748180866241455
  },
  {
    _id: 1,
    baseScore: 0.5,
    textField: 'Apple',
    searchScore: 0.1666666716337204
  }
]

I added a few extra documents as I was trying to understand how to achieve that ordering you were after but perhaps this approach won’t work for your data set / use case.

I think this would work for me well.
@Jason_Tran Thank you a lot for your suggestion!!!

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.