Atlas moreLikeThis with single similar document

Hi everyone,

I am trying to use the new moreLikeThis operator and encountered some behaviour that I cannot explain. I have an Atlas search index on a books collection that includes title, authors, genres, etc. I am using Atlas search for recommendations whenever an exact match cannot be found (given other criteria such as postcode, distance, city, etc.). I noticed that whenever the search term is encountered only once in the relevant field in my collection, I get 0 moreLikeThis results back. If it is more than once, then I get results. For example, for a search like this I get 0 results back since I have only 1 book by Frank Tallis.

[{
    "$search": {
      "index": "bookSearchIndex",
      "compound": {
        "must": [
          {
            "moreLikeThis": {
              "like": [
                {
                    authors: ["Frank"]
                }
              ]
            }
          }
        ]
      }
    }
  }
]

The explain() for above is:

"explain": {
          "path": "compound.must",
          "type": "DefaultQuery",
          "args": {
            "queryType": "MatchNoDocsQuery"
          }
        }

When I search for a term that is encountered at least twice in the collection, I get results. For example:

[{
    "$search": {
      "index": "bookSearchIndex",
      "compound": {
        "must": [
          {
            "moreLikeThis": {
              "like": [
                {
                    authors: ["Donna"]
                }
              ]
            }
          }
        ]
      }
    }
  }
]

I have two books, each by a different author with the first name Donna. The explain() for above is:

"explain": {
          "path": "compound.must",
          "type": "TermQuery",
          "args": {
            "path": "authors",
            "value": "donna"
          }
        }

This behaviour repeats across other fields (like title). I have two books: “Prisoner of Azkaban” and “Prisoners of Geogrpahy”. If I search under “prisoner”, I get recommendations, but 0 recommendations if I search under ‘geography’. Is it that the algorithm does not index/take into account terms that are encountered only once in the collection? If that is the case, I am better off using the “should” compound operator.

Many thanks for any explanations/advice.

Kind regards,
Gueorgui

Hi @Gueorgui_58194,

Your experience sounds accurate to me. The purpose of MoreLikeThis is to take some large text (100+ words) and extract 25 good words to query on, presenting similar records. As I understand it, there are some hardcoded limitations in there to not select words that appear only once. For your use case, I would recommend not to use MLT if the text is not long, and just use a normal text query OR supply a larger document for Atlas Search to find similar records.

I’m not sure using should/must would make a difference here, did you test that out?

Hi Elle,

Thanks very much for the prompt response and sorry for my late reply. The example given in the documentation for moreLikeThis (example 1) uses moreLikeThis for two short terms:

db.movies.aggregate([
  {
    "$search": {
    moreLikeThis: {
     like:
      {
        "title": "The Godfather",
        "genres": "action"
      }
    }
   }
  },
  { "$limit": 5},
  {
    $project: {
      "_id": 0,
      "title": 1,
      "released": 1,
      "genres": 1
    }
  }
])

In the context of that example, using moreLikeThis would mean that if there were a single movie with ‘Godfather’ in the title, there would be 0 results returned back (the example returns several results because ‘Godfather’ appears more than once). I personally found it counterintuitive that moreLikeThis would not return a single document with an exact match. In my use case, I was hoping that where other search criteria such as distance, postcode, city, etc, did not yield search results, I could relax those search criteria and use moreLikeThis for the text fields to expland the search. However, in view of the way moreLikeThis operates, I ultimately implemented my own recommendation algorithm with text/phrase queries and the compound operators.

Many thanks for clarifying how moreLikeThis operates.

Kind regards,
Gueorgui