Case-Insensitive Queries Without Case-Insensitive Indexes
Lauren Schaefer, Daniel CoupalPublished Feb 12, 2022 • Updated May 31, 2022
Rate this article
We've reached the sixth and final (at least for now) MongoDB schema design anti-pattern. In the first five posts in this series, we've covered the following anti-patterns.
Today, we'll explore the wonderful world of case-insensitive indexes. Not having a case-insensitive index can create surprising query results and/or slow queries...and make you hate everything.
Once you know the details of how case-insensitive queries work, the implementation is fairly simple. Let's dive in!
Check out the video above to see the case-insensitive queries and indexes in action.
MongoDB supports three primary ways to run case-insensitive queries.
First, you can run a case-insensitive query using $regex with the
ioption. These queries will give you the expected case-insensitive results. However, queries that use
$regexcannot efficiently utilize case-insensitive indexes, so these queries can be very slow depending on how much data is in your collection.
Second, you can run a case-insensitive query by creating a case-insensitive index (meaning it has a collation strength of
2) and running a query with the same collation as the index. A collation defines the language-specific rules that MongoDB will use for string comparison. Indexes can optionally have a collation with a strength that ranges from 1 to 5. Collation strengths of
2both give you case-insensitivity. For more information on the differences in collation strengths, see the MongoDB docs. A query that is run with the same collation as a case-insensitive index will return case-insensitive results. Since these queries are covered by indexes, they execute very quickly.
Third, you can run a case-insensitive query by setting the default collation strength for queries and indexes to a strength of
2when you create a collection. All queries and indexes in a collection automatically use the default collation unless you specify otherwise when you execute a query or create an index. Therefore, when you set the default collation to a strength of
2, you'll get case-insensitive queries and indexes by default. See the
collationoption in the db.createCollection() section of the MongoDB Docs for more details.
Warning for queries that do not use
$regex: Your index must have a collation strength of
2and your query must use the same collation as the index in order for your query to be case-insensitive.
You can use MongoDB Compass (MongoDB's desktop GUI) or the MongoDB Shell (MongoDB's command-line tool) to test if a query is returning the results you'd expect, see its execution time, and determine if it's using an index.
Let's revisit the example we saw in the Unnecessary Indexes Anti-Pattern and the Bloated Documents Anti-Pattern posts. Leslie is creating a website that features inspirational women. She has created a database with information about 4,700+ inspirational women. Below are three documents in her
Leslie decides to add a search feature to her website since the website is currently difficult to navigate. She begins implementing her search feature by creating an index on the
first_namefield. Then she starts testing a query that will search for women named "Harriet."
Leslie executes the following query in the MongoDB Shell:
She is surprised to only get one document returned since she has two Harriets in her database: Harriet Tubman and Harriet Beecher Stowe. She realizes that Harriet Beecher Stowe's name was input in all uppercase in her database. Her query is case-sensitive, because it is not using a case-insensitive index.
Leslie runs the same query with .explain("executionStats") to see what is happening.
The Shell returns the following output.
She can see that the
winningPlanis using an
IXSCAN(index scan) with her
first_name_1index. In the
executionStats, she can see that only one index key was examined (
executionStats.totalKeysExamined) and only one document was examined (
executionStats.totalDocsExamined). For more information on how to interpret the output from
.explain(), see Analyze Query Performance.
Leslie opens Compass and sees similar results.
Leslie wants all Harriets—regardless of what lettercase is used—to be returned in her query. She updates her query to use
ito indicate the regular expression should be case-insensitive. She returns to the Shell and runs her new query:
This time she gets the results she expects: documents for both Harriet Tubman and Harriet Beecher Stowe. Leslie is thrilled! She runs the query again with
.explain("executionStats")to get details on her query execution. Below is what the Shell returns:
She can see that this query, like her previous one, uses an index (
IXSCAN). However, since
$regexqueries cannot efficiently utilize case-insensitive indexes, she isn't getting the typical benefits of a query that is covered by an index. All 4,704 index keys (
executionStats.totalKeysExamined) are being examined as part of this query, resulting in a slightly slower query (
executionStats.executionTimeMillis: 3) than one that fully utilizes an index.
She runs the same query in Compass and sees similar results. The query is using her
first_name_1index but examining every index key.
$regexquery is using the
first_name_1index but examining every index key.
Leslie wants to ensure that her search feature runs as quickly as possible. She uses Compass to create a new case-insensitive index named
first_name-case_insensitive. (She can easily create indexes using other tools as well like the Shell or MongoDB Atlas or even programmatically.) Her index will be on the
first_namefield in ascending order and use a custom collation with a locale of
enand a strength of
2. Recall from the previous section that the collation strength must be set to
2in order for the index to be case-insensitive.
enand a strength of
Leslie runs a query very similar to her original query in the Shell, but this time she specifies the collation that matches her newly-created index:
This time she gets both Harriet Tubman and Harriet Beecher Stowe. Success!
She runs the query with
.explain("executionStats")to double check that the query is using her index:
The Shell returns the following results.
Leslie can see that the winning plan is executing an
IXSCAN(index scan) that uses the case-insensitive index she just created. Two index keys (
executionStats.totalKeysExamined) are being examined, and two documents (
executionStats.totalDocsExamined) are being examined. The query is executing in 0 ms (
executionStats.executionTimeMillis: 0). Now that's fast!
Leslie runs the same query in Compass and specifies the collation the query should use.
She can see that the query is using her case-insensitive index and the query is executing in 0 ms. She's ready to implement her search feature. Time to celebrate!
Note: Another option for Leslie would have been to set the default collation strength of her InspirationalWomen collection to
2when she created her collection. Then all of her queries would have returned the expected, case-insensitive results, regardless of whether she had created an index or not. She would still want to create indexes to increase the performance of her queries.
You have three primary options when you want to run a case-insensitive query:
ioption. Note that this option is not as performant because
$regexcannot fully utilize case-insensitive indexes.
- Create a case-insensitive index with a collation strength of
2, and specify that your query uses the same collation.
- Set the default collation strength of your collection to
2when you create it, and do not specify a different collation in your queries and indexes.
Alternatively, MongoDB Atlas Search can be used for more complex text searches.
This post is the final anti-pattern we'll cover in this series. But, don't be too sad—this is not the final post in this series. Be on the lookout for the next post where we'll summarize all of the anti-patterns and show you a brand new feature in MongoDB Atlas that will help you discover anti-patterns in your database. You won't want to miss it!
When you're ready to build a schema in MongoDB, check out MongoDB Atlas, MongoDB's fully managed database-as-a-service. Atlas is the easiest way to get started with MongoDB and has a generous, forever-free tier.
Check out the following resources for more information:
Create a Data Pipeline for MongoDB Change Stream Using Pub/Sub BigQuery Subscription
Apr 13, 2023
User Group Meetup
LONDON, UNITED KINGDOM | IN-PERSON
MUG London: Data Localisation and Tech Evolution at Scale with MongoDB
Jun 29, 2023 | 5:00 PM - 8:00 PM UTC