Autocomplete search results prefix match

Benny_Kachanovsky1 · November 8, 2022, 4:13pm

Hi!
Im experiencing a bit of weird results for my simple search use case.
We want to search for vendors, each vendor document has name property.
Most of vendors names are few words at most (i.e not full sentences).

Having these documents:

[
   { name: "Facebook" },
   { name: "The North Face" },
   { name: "Facebook Donations" },
   { name: "Prop Face" },
   { name: "Simplifying Payments with Facebook Pay" }
   { name: "Facebook Advertising" },
   { name: "facebook pay" }
]

the expected results is something similar to this:

   { name: "Facebook" },
   { name: "facebook pay" },
   { name: "Facebook Donations" },
   { name: "Facebook Advertising" },
   { name: "Prop Face" },
   { name: "The North Face" },
   { name: "Simplifying Payments with Facebook Pay" }

As first try, iv’e tried setting a search index as String field, Autocomplete attempt will follow.
This is the index definition:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": {
        "type": "string"
      }
    }
  },
  "storedSource": true
}

this is the query:

{
  $search: {
      index: 'default',
      text: {
         path: 'name',
         query: 'face'
      }
    }
}

returns

[
  { name: "Prop Face" },
  { name: "The North Face" }
]

which completely doesn’t make sense, probably i’m doing something wrong?

when moving to autocomplete index with this definition:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": [
        {
          "dynamic": true,
          "type": "document"
        },
        {
          "type": "autocomplete"
        }
      ]
    }
  },
  "storedSource": true
}

having this query:

{
  $search: {
    index: 'default',
    autocomplete: {
        path: 'name',
        query: 'face',
     }
   }
}

i’m getting these results:

[
  { name: "Facebook Donations" },
  { name: "Facebook Advertising" },
  { name: "facebook pay" },
  { name: "Simplifying Payments with Facebook Pay" },
  { name: "Facebook" },
  { name: "Prop Face" },
  { name: "The North Face" },
]

Which much closer to the desired result but still “Facebook” comes after “Simplifying Payments with Facebook Pay” for some reason, i would expect it to be found first.

any ideas or suggestions?

Benny_Kachanovsky1 · November 9, 2022, 8:45am

any input someone?

Jason_Tran · November 10, 2022, 1:37am

Hi @Benny_Kachanovsky1,

Benny_Kachanovsky1:

i’m getting these results:
[
  { name: "Facebook Donations" },
  { name: "Facebook Advertising" },
  { name: "facebook pay" },
  { name: "Simplifying Payments with Facebook Pay" },
  { name: "Facebook" },
  { name: "Prop Face" },
  { name: "The North Face" },
]
Which much closer to the desired result but still “Facebook” comes after “Simplifying Payments with Facebook Pay” for some reason, i would expect it to be found first.

Every document returned by an Atlas Search query is assigned a score based on relevance, and the documents included in a result set are returned in order from highest score to lowest.

Additionally, as per the autocomplete operator documentation:

autocomplete offers less fidelity in score in exchange for faster query execution.

I am a bit confused regarding the search term versus expected output. Could you clarify why documents containing the term "Facebook" should rank higher than exact matches for the query term "face" such as "Prop Face" or "The North Face"? I would think that since "Facebook" is only a partial match for the term "face", it should be ranked lower?

Look forward to hearing from you.

Regards,
Jason

Benny_Kachanovsky1 · November 12, 2022, 7:18pm

Hi, thanks for answering.
Although Facebook and Simplifying Payments with Facebook Pay should have similar score because they have the same partial match, i would like that Facebook result will have higher scroe.
Is there a way to achieve it?

Jason_Tran · November 14, 2022, 12:08am

Hi @Benny_Kachanovsky1,

Could you advise the use case details for documents containing "Facebook" to appear higher when querying for the term "face"? I.e., Why the partial match should rank higher than a full match. Is this, for example, due to it being a sponsored vendor and therefor should be scored higher for your use case? Again, this is a partial match rather than full match.

In saying so, one possible way to bring these set of results and have the document containing only "Facebook" at the top would be to specify "facebook" as one part the query itself. You could use the compound operator and boost the score for "Facebook" depending on your use case.

Example below uses the following index definition in my test environment:

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "name": [
        {
          "dynamic": true,
          "type": "document"
        },
        {
          "type" : "string"
        },
        {
          "type": "autocomplete"
        }
      ]
    }
  }
}

Example $search pipeline:

DB> db.names.aggregate(

{
    $search: {
        index: 'default',
        compound: {
            must: [
                {
                    autocomplete: {
                        path: 'name',
                        query: 'face'
                    }
                }
            ],
            should: [
                {
                    phrase: {
                        path: 'name',
                        query: 'facebook',
                        score: {'boost': {'value': 10}} /// <--- Boosted score for the text query "facebook"
                    }
                }
            ]
        }
    }
})

Output:

[
  { _id: ObjectId("636c29d90e5724148fb0ea04"), name: 'Facebook' },
  {
    _id: ObjectId("636c29d90e5724148fb0ea00"),
    name: 'Facebook Donations'
  },
  {
    _id: ObjectId("636c29d90e5724148fb0ea01"),
    name: 'Facebook Advertising'
  },
  { _id: ObjectId("636c29d90e5724148fb0ea02"), name: 'facebook pay' },
  {
    _id: ObjectId("636c29d90e5724148fb0ea03"),
    name: 'Simplifying Payments with Facebook Pay'
  },
  { _id: ObjectId("636c29d90e5724148fb0ea05"), name: 'Prop Face' },
  { _id: ObjectId("636c29d90e5724148fb0ea06"), name: 'The North Face' }
]

The above example index definition and $search pipeline was used against a test collection containing only the above 7 documents. If you believe this may help, please thoroughly test and adjust the examples accordingly to verify if it suits your use case and requirements.

Regards,
Jason

Benny_Kachanovsky1 · November 14, 2022, 11:06pm

Hi!
Thanks a lot!
You’re right, facebook is partial match and therefore should rank higher.
But, in the example above “Simplifying Payments with Facebook Pay” comes before “Facebook” but both should have the same match when searching for “face”.
Is there a way to rank higher docs that starts with the given search term?

Jason_Tran · November 15, 2022, 9:41pm

Thanks for the additional details.

I think you have a specific idea on how the rankings when someone search for the term "face". Unfortunately, the search algorithm is not customizable to provide the very specific ordering which you have mentioned. You can sort of do this by using score boosting as per my previous example, but at this point you are fighting against the algorithm and may find many corner cases that doesn’t match your idea of how the ranking should look like.

If you need a very specific ranking given a specific term that goes against the search algorithm, I think an alternate way forward is to catch the term "face" before it goes into the search algorithm, and return a result set that’s ranked exactly as you wanted it. It will also be more maintainable, since you’re not depending on the search algorithm.

Regards,
Jason