Atlas search not working properly when searching for a string with space in it

Dan_Muntean1 · May 25, 2023, 7:29am

Hi, I’m working on a feature in which we need to search by name in our database. The search works fine for single word queries, but when I input a string with spaces in it (2 or more words), the results are not accurate, because it returns different documents which match only one of the words that I inputted. From what I have read on a similar post on this forum, it might be a tokenization issue and the person which fixed it made a fix which works for exact matches, here I need an autocomplete solution for this issue.

amyjian · May 25, 2023, 1:52pm

Hi @Dan_Muntean1! Can you share your index definition, an example query and sample document?

Dan_Muntean1 · May 26, 2023, 6:36am

Hi, here is my index:

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "groupTag": {
        "type": "autocomplete"
      },
      "name": {
        "type": "autocomplete"
      }
    }
  }
}

Here is my document:

{
  "_id": {
    "$oid": "63e4f8a261f74736f0fcc8b6"
  },
  "_v": 4,
  "groupTag": "T-2",
  "createdAt": {
    "$date": {
      "$numberLong": "1675950242663"
    }
  },
  "updatedAt": {
    "$date": {
      "$numberLong": "1676022397735"
    }
  },
  "name": "Dan test group",
}

And if I search for the name “Dan test group”, the search feature returns me other documents which contains in the name field any of the words “Dan”, “test”, “group”, and I end getting information that I didn’t search for. I need the search to return documents which name contains the entire string that I typed (“Dan test group”).

amyjian · June 8, 2023, 5:23pm

Hi Dan,

This is happening because you are using the autocomplete field mapping, which allows you to return results which partially match your search query. If you are interested in return exact matches, you might want to consider using a string field mapping type with the lucene.keyword analyzer. Your index definition would look something like this:

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "groupTag": {
        "type": "string",
        "analyzer": "lucene.keyword"
      },
      "name": {
        "type": "string",
        "analyzer": "lucene.keyword"
      }
    }
  }
}

You can learn more about exact matching in this blog post.

Dan_Muntean1 · June 15, 2023, 7:50am

Thank you for help, but this solution does not help me. I need to still use autocomplete, but when I type 2 words, I need the response to contain both words. Currently it returns me different responses for each word.

Hiren_Thakkar · February 19, 2024, 9:15am

Hi @amyjian , this seems to work as per the document. However, I’m still not having the expected results. Instead of getting a zero count.
Tried with the below index and neither one working

"task_name": { "analyzer": "lucene.simple", "multi": { "keywordAnalyzer": { "analyzer": "lucene.keyword", "type": "string" } }, "searchAnalyzer": "lucene.simple", "type": "string" }
"task_name": { "analyzer": "lucene.keyword", "type": "string" }

Hiren_Thakkar · February 21, 2024, 10:22am

Hi @Dan_Muntean1 did you find any solution for your issue? As i m having the same issue and not able to fix it.

Dan_Muntean1 · February 21, 2024, 10:39am

Hello, I got an idea how to solve it, but didn’t manage to implement it. I will come back with an answer the following weeks. I understood that we need to add “tokenOrder:sequential” into the autocomplete operator. Maybe this will help you https://www.mongodb.com/docs/atlas/atlas-search/autocomplete/

Hiren_Thakkar · February 21, 2024, 12:21pm

Hi @Dan_Muntean1 , I have tried with that as well. However it’s not responding any results.

{
  "analyzer": "lucene.simple",
  "searchAnalyzer": "lucene.whitespace",
  "mappings": {
    "dynamic": true,
    "fields": {
      "task_name": [
        {
          "type": "stringFacet"
        },
        {
          "type": "string"
        },
        {
          "foldDiacritics": false,
          "maxGrams": 7,
          "minGrams": 3,
          "type": "autocomplete"
        }
      ]
    }
  }
}

Can you please help here, if I’m doing anything wrong here?

Dan_Muntean1 · February 21, 2024, 1:10pm

I am using lucene.standard for both analyzer si search analyzer, and for minGrams I am using 2 and for max grams I am using 15. This is the default what I am getting and from what I’ve understood minGrams 2 is the best practice to use. Also the search is working for 2+ letters when using minGrams 2. Also, can you share the code where you are querying this?

Hiren_Thakkar · February 21, 2024, 1:49pm

Right now i’m using the mongodb cloud platform to check.

Dan_Muntean1 · February 22, 2024, 6:54am

share please that query that you use in mongo platform

Hiren_Thakkar · February 22, 2024, 8:47am

db.collection.aggregate([
                {
                    $search: {
                        index: 'tasks_search',
                        compound: {
                            ...query,
                        },
                        count: {
                            type: 'total',
                        },
                        sort: sort
                            ? sort
                            : {
                                  due_date: 1,
                              },
                    },
                },
                {
                    $skip: skip || 0,
                },
                {
                    $limit: limit,
                },
                {
                    $group: {
                        _id: null,
                        data: { $push: '$$ROOT' },
                        count: { $first: '$$SEARCH_META.count.total' },
                    },
                },
                {
                    $project: {
                        _id: 0,
                        data: 1,
                        count: 1,
                    },
                },
            ])

Dan_Muntean1 · February 26, 2024, 9:39am

Firstly try to get just the result without the count of the items to identify the problem. { $group: { _id: null, total: { $sum: 1 }, items: { $push: '$$ROOT' }, }, }, { $project: { _id: 0, total: 1, items: { $slice: ['$items', skip, limit], }, }, }
Here is a snippet of code that I use to get the total count of items.