Synonyms are ignored when using some analyzers

Oliver_Haas · March 8, 2023, 2:02pm

Say I just have one document in a collection

{
    _id: <whatever>,
    sound: 'Dong'
}

and a synonyms collection with only one mapping

{
    mappingType: 'explicit',
    input: ['Ding'],
    synonyms: ['Ding', 'Dong']
}

and I want to create a search index which uses those to return the one document when one queries for 'Ding' on the property sound.

In this minimal example I can just use the lucene.standard analyzer and all works perfectly (lucene.english works as well). But changing just the analyzer definitions to lucene.keyword (and custom analyszers, but there I might be making another mistake) breaks things, i.e. no document is returned. The definitions are pretty straight-forward; search index field definition

  "sound": {
    "analyzer": "lucene.keyword",
    "searchAnalyzer": "lucene.keyword",
    "type": "string"
  },

and synonyms

  "synonyms": [
    {
      "analyzer": "lucene.keyword",
      "name": "synonym_mapping",
      "source": {
        "collection": "synonyms"
      }
    }
  ]

Using MongoDB Compass to explain the query, I can see that for lucene.standard and lucene.english the explain looks slightly different (type: "DefaultQuery" and "queryType": "SafeTermAutomatonQueryWrapper" sounds like a wrapper for synonyms is used, maybe?) than for the not-working analyzers (type: "TermQuery"), but there is no documentation on what everything means.

At this point, my best guess is that either some analyzers are not supposed to work with synonyms (I couldn’t find anything in the docs though, no error or warning either obviously), or the implementation to handle that case is missing.

Am I doing something wrong?

amyjian · March 8, 2023, 2:54pm

Hi @Oliver_Haas ! Can you share the query you are running?

Oliver_Haas · March 8, 2023, 3:02pm

Oh yes, sure. @amyjian

[
  {
    $search: {
      index: "default",
      text: {
        query: "Ding",
        path: 'sound',
        synonyms: 'synonym_mapping'
      }
    }
  }
]

Oliver_Haas · March 8, 2023, 4:17pm

I think I somewhat understand the behavior now. The following starts with the use-case of the question with the lucene.keyword analyzer. What I think happens is the following:

Query for sound: 'Ding'
'Ding' is converted to lowercase, contrary to lucene.keyword behavior, and synonyms are looked up for 'ding'
'ding' synonyms was not found, search returns no results

So if I change my synonyms to

{
    mappingType: 'explicit',
    input: ['ding'],
    synonyms: ['Ding', 'Dong']
}

I can find documents with 'Ding' or 'Dong', but here the case matters again, because that is lucene.keyword behavior.

I guess it maybe makes sense, because I read that lucene (always?) parses queries to lowercase, but since this conflict with the behavior of lucene.keyword this is pretty confusing, to me anyway. Honestly I feel like this is a mistake in the implementation, since lucene.keyword can’t be used case-sensitive with synonyms this way.

What I will use in the end is a custom analyzer which behaves like a case-insenstive lucene.keyword, since I don’t care about the case but want to match multi-word-queries otherwise, and use lowercase synonyms. But I won’t start with this today…