Search index on array & Aggregation Pipeline

Craig_Watson · October 31, 2022, 1:10pm

Hello,

As part of a proof-of-concept to demonstrate searches & indexes, I’ve created a collection of films and denoted keywords for each. I’ve then created a search index using entries in the keywords array using lucene.english to take advantage of stemming. The goal is to provide a service whereby a user can enter these keywords and be presented with matching films.

Collection - named "films"

"_id": 635be28f9f4d67744732c92c
"name": "Kungfu Panda"
"keywords:" [
   "martial",
   "art"
   "kid",
   "animal"
 ]
},
"_id": 635be28f9f4d67744732c92c
"name": "Spiderman"
"keywords:" [
   "insect",
   "super"
   "hero",
   "superhero",
  "kid"
 ]
},

Index Definition - named "filmKeywords"

{
  "analyzer": "lucene.english",
  "searchAnalyzer": "lucene.english",
  "mappings": {
    "fields": {
      "keywords": {
        "analyzer": "lucene.english",
        "searchAnalyzer": "lucene.english",
        "type": "string"
      }
    }
  }
}

In Compass I’m trying to create an Aggregation Pipeline to demonstrate how the query terms would return different documents - based on reading the MongoDB manuals I have created a single $search stage so far like this

 {
  index: 'filmKeywords',
  text: {
    query: 'kids',
    path: 'keywords.*'
  }
}

…however I’m not having any documents returned.

My questions are

Am I using the correct approach in this $search stage, and the correct syntax?
Should I be using a different type of search here due to the index being on an array of strings, rather than a string attribute on the object?

Elle_Shwer · October 31, 2022, 1:29pm

Hi there,

I was able to return results using this query:

{
  text: {
    query: 'kid',
    path: 'keywords'
  }
}

However, if you’d like to turn on fuzzy matching so results still return for kids, you can add the fuzzy parameter like so:

{
  text: {
    query: 'kids',
    path: 'keywords',
    fuzzy: {}
  }
}

Craig_Watson · October 31, 2022, 2:27pm

Thank you @Elle_Shwer for the quick reply.

I revised my query in the $search stage to

{
  index: 'filmKeywords',
  text: {
    query: 'kids',
    path: 'keywords'
  }
}

and now it works, as the stem on “kids” matches “kid” - I was certain that didn’t work first time!

I have another question on the scores and ordering, which I’ll start a new topic for.