C# Aggregate and Collation

I have been struggling with finding a way to include case-insensitive collation into the Match stage of an Aggregate using the C# driver. I’ve verified that there is a case-insensitive index for the first name field in the code below by listing the indexes with UserCollection.Indexes.List().ToListAsync(). It contains the correct locale and strength.

The below code does not work unless the search term matches the case of the first name. I’ve tried googling just about everything as well as using chat.openai and have not been able to find the right way to get this working.

var searchName = "test";
var searchTermRegEx = $"^{searchName}*";
var firstNameFilter = Builders<User>.Filter.Regex(f => f.FirstName, searchTermRegEx);

var collation = new Collation("en", strength: CollationStrength.Primary);
var options = new AggregateOptions()
{
	Collation = collation
};
var userList = await UserCollection
	.Aggregate(options)
	.Match(firstNameFilter)
	.ToListAsync();
return userList;

Does anyone have some suggestions of where to go with this?
Thanks!

Hi, @Jim_Owen,

Welcome to the MongoDB Community Forums. I understand that you’re having an issue with a query involving a regular expression and a case insensitive collation.

Unfortunately this is a known limitation of MongoDB’s regular expression implementation:

Case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilize case-insensitive indexes.

See the section on index use for $regex for full details.

If you want to perform a case insensitive regex, you can do something like this:

var searchTermRegEx = new Regex($"^{searchName}*", RegexOptions.IgnoreCase);

However this query cannot use the index and will perform a collection scan, which could result in suboptimal performance for a large collection.

If you are doing an exact match (rather than a prefix match), you can use Builders<User>.Filter.Eq, which would be able to leverage the case-insensitive collation. You would be matching the entire FirstName, not just the beginning of the name.

If you need to perform a prefix match, you can store SearchableFirstName where you perform the FirstName.ToLower() operation in code. Then you can create a normal index on SearchableFirstName and /^test/ can leverage the index because the contents of the field and the search term are all in lower case.

If you are using MongoDB Atlas, you can leverage Atlas Search to implement full text search capabilities. MongoDB .NET/C# Driver 2.19.0 now includes a fluent API for Atlas Search.

Hopefully that provides you some ideas to implement your solution.

Sincerely,
James

Hi @James_Kovacs

Thanks for the very prompt reply - it’s appreciated.

I had begun taking a look at Atlas Search, but didn’t find much in the way of a fluent API for it. Can you give me a link to the doc and examples? I think I’d prefer to go that direction. Thanks!

Hi, Jim,

The Fluent Atlas Search API was just added to the driver in 2.19.0, which was only released this past Friday. Our Docs Team is still in the process of writing the documentation for this feature. In the meantime, the best source of examples is our integration tests:

From a high-level, you can create a search expression using Builders<T>.Search, which can then be used with Fluent Aggregate (IMongoCollection.Aggregate().Search(searchExpression)) or LINQ (IMongoCollection.AsQuerayble().Search(searchExpression).

The API closely mirrors the $search syntax, which can be found in our $search documentation. Hopefully Intellisense and the $search documentation is enough for you to make some progress while we work on documenting this new driver feature.

Sincerely,
James

Thanks James! Very glad to hear it’s available now. I’ll get the NuGet package and give it a try.

Hi James,

Thanks for the assistance - I have case-insensitive search working when indexing on a single or multi column and that’s much more than I had before.

What I’d like to accomplish is performing a SearchPath.Multi on two fields combined with a wildcard. Something like this:

                var results2 = UserCollection
                    .Aggregate()
                    .Search(
                        Builders<User>.Search.Wildcard(
                                Builders<User>.SearchPath.Multi(x => x.FirstName, x => x.LastName), "tes*")
                        )
                    .ToListAsync();

The error I’m receiving is

Command aggregate failed: PlanExecutor error during aggregation :: caused by :: Remote error from mongot :: caused by :: Field FirstName is analyzed. 
Use a keyword analyzed field or set allowAnalyzedField to true.

But I can’t find out where allowAnalyzedField should be set.
I also have the feeling that my code is not quite right on the simple call with the wildcard and the multi-column search path.

Any suggestions you can offer would be great.

Thanks!

To apply the case-insensitive collation to the Match stage of the Aggregate pipeline, you need to use the $expr operator along with the $regexMatch expression. Here is an example of how you could modify your code:

var searchName = "test";
var searchTermRegEx = $"^{searchName}*";
var firstNameFilter = Builders<User>.Filter.Expr(
  expr => new { expr = expr.FirstName, type = "regex", pattern = searchTermRegEx, options = "i" }
);

var collation = new Collation("en", strength: CollationStrength.Primary);
var options = new AggregateOptions()
{
	Collation = collation
};
var userList = await UserCollection
	.Aggregate(options)
	.Match(firstNameFilter)
	.ToListAsync();
return userList;

The $expr operator evaluates a specified expression and returns its result. The $regexMatch expression performs a regular expression match, and the options field specifies the regular expression options, which include i for case-insensitive matching.

Hi Sumanta,

Thanks for the response although I was able to get the case-insensitive search working using the new Atlas Search features of the most recent release of the C# driver. Here’s the code that I’m using that works perfectly for the two different columns of the collection.

public async Task<List<User>> ReadListAsync(string searchName, int pageSize, int offset, bool includeActiveUsers, bool includeDeletedUsers)
{
      // pageSize
      pageSize = pageSize <= 0 ? int.MaxValue : pageSize;   // Although Mongo Doc states 0 = unlimited, zero will throw an error

      // Create the name filter for first or last name match
      var searchTermRegEx = $"{searchName}*";
      // Create the wildcard search def
      var wildcardSearchDef = Builders<User>.Search.Wildcard(                     Builders<User>.SearchPath.Multi(x => x.FirstName, x => x.LastName), 
          searchTermRegEx, 
          allowAnalyzedField: true);
      // Create the Sort Stage
      var sortStage = Builders<User>.Sort.Ascending(f => f.LastName).Ascending(f => f.FirstName);

      var userList = await UserCollection
          .Aggregate()
          .Search(wildcardSearchDef)
          .Sort(sortStage)
          .Skip(offset)
          .Limit(pageSize)
          .ToListAsync();

      return userList;
}

The only thing I have yet to include is the additional filtering needed for the two boolean values passed in the method parameters.

If you have any suggestions for that, I’d be happy to hear it.
Cheers!

1 Like