/ /

Custom Analyzers

Overview

A MongoDB Search analyzer prepares a set of documents to be indexed by performing a series of operations to transform, filter, and group sequences of characters. You can define a custom analyzer to suit your specific indexing needs from the Atlas UI.

A custom MongoDB Search analyzer is made up of a tokenizer, an optional list of character filters, and an optional list of token filters. When MongoDB Search analyzes text with a custom analyzer, the text passes through the character filters first, then the tokenizer, and then the token filters.

Syntax

A custom analyzer has the following syntax:

Syntax

"analyzers": [
  {
    "name": "<name>",
    "charFilters": [ <list-of-character-filters> ],
    "tokenizer": {
      "type": "<tokenizer-type>"
    },
    "tokenFilters": [ <list-of-token-filters> ]
  }
]

Attributes

A custom analyzer has the following attributes:

Attribute	Type	Description	Required?
`name`	string	Name of the custom analyzer. Names must be unique within an index, and may not start with any of the following strings: `lucene.` `builtin.` `mongodb.`	yes
`charFilters`	list of objects	Array containing zero or more character filters. Character filters examine text one character at a time and perform filtering operations. To learn more, see Character Filters.	no
`tokenizer`	object	Tokenizer to use to create tokens. An analyzer uses a tokenizer to split chunks of text into groups, or tokens, for indexing purposes. To learn more, see Tokenizers.	yes
`tokenFilters`	list of objects	Array containing zero or more token filters. A token filter performs operations such as: Stemming, which reduces related words, such as "talking", "talked", and "talks" to their root word "talk". Redaction, which removes sensitive information from public documents. To learn more, see Token Filters.	no

Built-in Custom Analyzer Templates

The Atlas UI Visual Editor includes the option to choose between the following common-use templates to help you get started with building your custom analyzer:

Email Parser - Use this to tokenize email addresses up to 200 characters using the uaxUrlEmail tokenizer. For example, you can apply this analyzer on the page_updated_by.email field in the Example Collection.
Phone Numbers - Use this to create a single token from a US-formatted phone number using the regexCaptureGroup tokenizer. For example, you can apply this analyzer on the page_updated_by.phone field in the Example Collection.
Dash-Separated IDs - Use this to create tokens from hyphen-delimited text using the regexSplit tokenizer. For example, you can apply this analyzer on the message field in the Example Collection.

You can use these built-in custom analyzers or create your own custom analyzer using the MongoDB Search Visual Editor or JSON Editor.

Example Collection

The Character Filters, Tokenizers, and Token Filters pages contain sample index definitions and query examples for character filters, tokenizers, and token filters. These examples use a sample minutes collection with the following documents:

{
  "_id": 1,
  "page_updated_by": {
    "last_name": "AUERBACH",
    "first_name": "Siân",
    "email": "auerbach@example.com",
    "phone": "(123)-456-7890"
  },
  "title": "The team's weekly meeting",
  "message": "try to siGn-In",
  "text": {
    "en_US": "<head> This page deals with department meetings.</head>",
    "sv_FI": "Den här sidan behandlar avdelningsmöten",
    "fr_CA": "Cette page traite des réunions de département"
  }
}

{
  "_id": 2,
  "page_updated_by": {
    "last_name": "OHRBACH",
    "first_name": "Noël",
    "email": "ohrbach@example.com",
    "phone": "(123) 456 0987"
  },
  "title": "The check-in with sales team",
  "message": "do not forget to SIGN-IN. See ① for details.",
  "text" : {
    "en_US": "The head of the sales department spoke first.",
    "fa_IR": "ابتدا رئیس بخش فروش صحبت کرد",
    "sv_FI": "Först talade chefen för försäljningsavdelningen"
  }
}

{
  "_id": 3,
  "page_updated_by": {
    "last_name": "LEWINSKY",
    "first_name": "Brièle",
    "email": "lewinsky@example.com",
    "phone": "(123).456.9870"
  },
  "title": "The regular board meeting",
  "message": "try to sign-in",
  "text" : {
    "en_US": "<body>We'll head out to the conference room by noon.</body>"
  }
}

{
  "_id": 4,
  "page_updated_by": {
    "last_name": "LEVINSKI",
    "first_name": "François",
    "email": "levinski@example.com",
    "phone": "123-456-8907"
  },
  "title": "The daily huddle on tHe StandUpApp2",
  "message": "write down your signature or phone №",
  "text" : {
    "en_US": "<body>This page has been updated with the items on the agenda.</body>" ,
    "es_MX": "La página ha sido actualizada con los puntos de la agenda.",
    "pl_PL": "Strona została zaktualizowana o punkty porządku obrad."
  }
}

Learn More

To learn more about creating your own custom analyzers, see the following reference pages:

Note

When you add a custom analyzer using the Visual Editor in the Atlas UI, the Atlas UI displays the following details about the analyzer in the Custom Analyzers section.

Name	Label that identifies the custom analyzer.
Used In	Fields that use the custom analyzer. Value is None if custom analyzer isn't used to analyze any fields.
Character Filters	MongoDB Search character filters configured in the custom analyzer.
Tokenizer	MongoDB Search tokenizer configured in the custom analyzer.
Token Filters	MongoDB Search token filters configured in the custom analyzer.
Actions	Clickable icons that indicate the actions that you can perform on the custom analyzer. Click to edit the custom analyzer. Click to delete the custom analyzer.

Back

Multi

Character Filters