Docs Menu

How to Define a Custom Analyzer and Run a Diacritic-Insensitive Query

On this page

  • Create the Atlas Search Index
  • Search the Collection

This tutorial describes how to create an index that uses a custom analyzer and run a diacritic-insensitive query against the sample_mflix.movies collection. It takes you through the following steps:

  1. Set up an Atlas Search index on the title and genres fields in the sample_mflix.movies collection.
  2. Run an Atlas Search compound query against the title and genres fields in the sample_mflix.movies collection using the wildcard and text operators.

Before you begin, ensure that your Atlas cluster meets the requirements described in the Prerequisites.

In this section, you will create an Atlas Search index on the title and genres fields in the sample_mflix.movies collection.

1
  1. If it is not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.
  2. If it is not already displayed, select your desired project from the Projects menu in the navigation bar.
  3. Click your cluster's name.
  4. Click the Search tab.
2
3
4
  1. In the Index Name field, enter default.

    Note

    If you name your index default, you don't need to specify an index parameter when using the $search pipeline stage. Otherwise, you must specify the index name using the index parameter.

  2. In the Database and Collection section, find the sample_mflix database, and select the movies collection.
5

Use the JSON Editor in the Atlas user interface to create the index.

  1. Replace the default definition with the following:
1{
2 "mappings": {
3 "fields": {
4 "genres": {
5 "type": "string"
6 },
7 "title": {
8 "analyzer": "diacriticFolder",
9 "type": "string"
10 }
11 }
12 },
13 "analyzers": [{
14 "charFilters": [],
15 "name": "diacriticFolder",
16 "tokenizer": {
17 "type": "keyword"
18 },
19 "tokenFilters": [{
20 "type": "icuFolding"
21 }]
22 }]
23}

This index definition for the genres and title fields specifies a custom analyzer, diacriticFolder, using the following:

  • keyword tokenizer that tokenizes the entire input as a single token.
  • icuFolding token filter that applies character foldings such as accent removal and case folding.

The index definition specifies a string type for the genres and title fields. It also applies the custom analyzer named diacriticFolder on the title field.

  1. Click Next.
6
7

A modal window appears to let you know your index is building. Click the Close button.

8

The index should take about one minute to build. While it is building, the Status column reads Build in Progress. When it is finished building, the Status column reads Active.


Use the Select your language drop-down menu to set the language of the example in this section.


You can use the compound operator to combine two or more operators into a single query. The sample query in this section uses the compound operator to query the title and genres fields in the movies collection using multiple operators.

In this section, connect to your Atlas cluster and run the sample query against the sample_mflix.movies collection using the compound operator.

←  How to Run Multilingual Atlas Search QueriesHow to Run Atlas Search String Queries Against Date and Numeric Fields →

Select your language

Give Feedback
© 2022 MongoDB, Inc.

About

  • Careers
  • Investor Relations
  • Legal Notices
  • Privacy Notices
  • Security Information
  • Trust Center
© 2022 MongoDB, Inc.