Mongo Lucene Regex Search

Hi,

I’m running following query on atlas cloud

[
  {
    $search: {
      compound: {
        must: [
          {
            text: {
              path: "site",
              query: "test",
            },
          },
          {
            text: {
              path: "objectKey",
              query: "product",
            },
          },
          {
            regex: {
              path: ["name"],
              query: "(.*)yearly(.*)",
            },
          },
        ],
      },
    },
  },
  { $sort: { displayOrder: -1 } },
  { $skip: 0 },
  { $limit: 10 },
]

Here I’m trying to match the name using regular expression. “name” field is indexed using lucene.keyword. I’m expecting that it will do case insensitive search but it’s not. If I do “Yearly” then it works but with “yearly”, it’s not working.
Do we need to do any changes in index so it can work with case insensitive?

Thanks,

lucene.keyword indexes the original value as-is, case-specific. To achieve case-insensitive on a full string like that, create a custom analyzer that uses the keyword tokenizer followed by a lowercase filter.

I would caution you about using a regex with leading wildcard characters - as that will do effectively an index scan. If you can tokenize the text such that “yearly” is its own word/token that would be a more performant query.

Also, I see you’re doing a $sort after $search. I recommend, instead, that you map displayOrder to a token field type (or numeric if that is a better fit) and use the sort parameter of $search. Operations after $search consume ALL of the results in order to do sorting, whereas using the $search.sort option will be vastly more performant.