How to Index String Fields for Efficient Filtering and Sorting
On this page
You can use the Atlas Search token
type to index string fields for sorting the Atlas Search results.
You can then use the $search
sort
option in your query
to sort the results by the indexed field. To learn more, see
Sort Atlas Search Results. You can also use the Atlas Search token
type to index
string fields for
pre-filtering the data that the $vectorSearch
queries analyze. To learn more, see Atlas Vector Search Overview.
To run queries against string
fields using the following operators, you must index the field as the
Atlas Search token
type:
To learn more, see the documentation for each respective operator.
Review the Behavior of the token
Type
When you index a field as token
type, Atlas Search indexes the terms in
the string as a single token (searchable term) and stores them in a
columnar storage for efficient filtering or sort operations. You
can use a normalizer to transform the
token. By default, the normalizer is set to none
and so Atlas Search
indexes strings in their original form.
The major difference between the Atlas Search string
and token
types is
that Atlas Search creates one or more tokens for fields indexed as string
type whereas Atlas Search creates only a single token for fields indexed as the
token
type.
If a string being indexed as a token
field type exceeds 8181
characters, Atlas Search truncates it to 8181 characters before indexing.
Review token
Type Limitations
When you index a field as the token
type, you must index that field
as string
type also to query the text value using operators such as
text, phrase, etc. For the following operators,
you don't need to index the field as string
type also to query the
text value in the field:
You can't index children of fields indexed as the embeddedDocuments type as the token
type.
Define the Index for the token
Type
To define the index for the token
type, choose your preferred
configuration method in the Atlas UI and then select the
database and collection.
Click Refine Your Index to configure your index.
In the Field Mappings section, click Add Field to open the Add Field Mapping window.
Click Customized Configuration.
Select the field to index from the Field Name dropdown.
Note
You can't index fields that contain the dollar (
$
) sign at the start of the field name.Click the Data Type dropdown and select Token.
(Optional) Expand and configure the Token Properties for the field. To learn more, see Configure
token
Field Properties.Click Add.
The following is the JSON syntax for the token
type.
Replace the default index definition with the following. To learn more
about the fields, see Field Properties.
{ "mappings": { "dynamic": true|false, "fields": { "<field-name>": { "type": "token", "normalizer": "lowercase | none" } } } }
Configure token
Field Properties
The Atlas Search token
type takes the following parameters:
Option | Type | Necessity | Description | Default |
---|---|---|---|---|
type | string | Required | Human-readable label that identifies this field type.
Value must be token . | |
normalizer | string | Optional | Type of transformation to perform on the field value. Value can be one of the following:
If you don't set this option explicitly, it defaults to | none |
Try an Example for the token
Type
The following index definition example uses the sample_mflix.movies collection. If you have the sample data already loaded on your cluster, you can use the Visual Editor or JSON Editor in the Atlas UI to configure the index. After you select your preferred configuration method, select the database and collection, and refine your index to add field mappings.
The following index definition indexes string values in the
title
field as Atlas Search token
type and converts the
field value to lowercase
, which allows you to do the
following:
Perform case-insensitive sort, as specified by the
normalizer
, on thetitle
field.Run exact match queries on the
title
field using the following operators:
In the Add Field Mapping window, select title from the Field Name dropdown.
Click the Data Type dropdown and select Token.
Expand Token Properties and select
lowercase
from the Normalizer dropdown.Click Add.
Replace the default index definition with the following index definition.
{ "mappings": { "dynamic": false, "fields": { "title": { "type": "token", "normalizer": "lowercase" } } } }
The following index definition indexes the genres
field as
string
and token
types to return the following:
Search results for queries using Atlas Search operators like text, phrase, and other operators that perform text search on the
genres
field.Sorted results for queries using the
$search
sort option on thegenres
field.Exact matches for queries using Atlas Search operators like equals, in, and range.
In the Add Field Mapping window, select genres from the Field Name dropdown.
Click the Data Type dropdown and select Token.
Click Add.
Repeat step 1 and select String from the Data Type dropdown.
Review the default setting for String Properties and click Add.
Replace the default index definition with the following index definition.
{ "mappings": { "dynamic": false, "fields": { "genres": [{ "type": "string" }, { "type": "token" }] } } }