An Introduction to Indexes for MongoDB Atlas Search
Rate this tutorial
Imagine reading a long book like "A Song of Fire and Ice," "The Lord of the Rings," or "Harry Potter." Now imagine that there was a specific detail in one of those books that you needed to revisit. You wouldn't want to search every page in those long books to find what you were looking for. Instead, you'd want to use some sort of book index to help you quickly locate what you were looking for. This same concept of indexing content within a book can be carried to with search indexes.
Atlas Search makes it easy to build fast, relevant, full-text search on top of your data in the cloud. It's fully integrated, fully managed, and available with every MongoDB Atlas cluster running MongoDB version 4.2 or higher.
Correctly defining your indexes is important because they are responsible for making sure that you're receiving relevant results when using Atlas Search. There is no one-size-fits-all solution and different indexes will bring you different benefits.
In this tutorial, we're going to get a gentle introduction to creating indexes that will be valuable for various full-text search use cases.
Before we get too invested in this introduction, it's important to note that Atlas Search uses . This means that search indexes are not unique to Atlas Search and if you're already comfortable with Apache Lucene, your existing knowledge of indexing will transfer. However, the tutorial could act as a solid refresher regardless.
Before we start creating indexes, we should probably define what our data model will be for the example. In an effort to cover various indexing scenarios, the data model will be complex.
Take the following for example:
The above example document is around Pokemon, but Atlas Search can be used on whatever documents are part of your application.
Example documents like the one above allow us to use text search, geo search, and potentially others. For each of these different search scenarios, the index might change.
When we create an index for Atlas Search, it is created at the collection level.
There are two ways to map fields within a document when creating an index:
- Dynamic Mappings
- Static Mappings
If your document schema is still changing or your use case doesn't allow for it to be rigidly defined, you might want to choose to dynamically map your document fields. A dynamic mapping will automatically assign fields when new data is inserted.
Take the following for example:
The above JSON represents a valid index. When you add it to a collection, you are essentially mapping every field that exists in the documents and any field that might exist in the future.
We can do a simple search using this index like the following:
We didn't explicitly define the fields for this index, but attempting to search for "thunder" within the
movesarray will give us matching results based on our example data.
To be clear, dynamic mappings can be applied at the document level or the field level. At the document level, a dynamic mapping automatically indexes all common data types. At both levels, it automatically indexes all new and existing data.
While convenient, having a dynamic mapping index on all fields of a document comes at a cost. These indexes will take up more disk space and may be less performant.
The alternative is to use a static mapping, in which case you specify the fields to map and what type of fields they are. Take the following for example:
In the above example, the only field within our document that is being indexed is the
The following search query would return results:
If we try to search on any other field within our document, we won't end up with results because those fields are not statically mapped nor is the document schema dynamically mapped.
There is, however, a way to get the best of both worlds if we need it.
Take the following which uses static and dynamic mappings:
In the above example, we are still using a static mapping for the
namefield. However, we are using a dynamic mapping on the
pokedex_entryfield is an object so any field within that object will get the dynamic mapping treatment. This means all sub-fields are automatically mapped, as well as any new fields that might exist in the future. This could be useful if you want to specify what top level fields to map, but map all fields within a particular object as well.
Take the following search query as an example:
The above search will return results if "pokemon" appears in the
namefield or the
redfield within the
When using a static mapping, you need to specify a type for the field or have
dynamicset to true on the field. If you only specify a type,
dynamicdefaults to false. If you only specify
dynamicas true, then Atlas Search can automatically default certain field types (e.g., string, date, number).
With the basic dynamic versus static mapping discussion out of the way for MongoDB Atlas Search indexes, now we can focus on more complicated or specific scenarios.
Let's first take a look at what our fully mapped index would look like for the document in our example:
In the above example, we are using a static mapping for every field within our documents. An interesting thing to note is the
movesarray and the
pokedex_entryobject in the example document. Even though one is an array and the other is an object, the index is a
documentfor both. While writing searches isn't the focus of this tutorial, searching an array and object would be similar using dot notation.
Had any of the fields been nested deeper within the document, the same approach would be applied. For example, we could have something like this:
In the above example, the
pokedex_entryfield was changed slightly to have another level of objects. Probably not a realistic way to model data for this dataset, but it should get the point across about mapping deeper nested fields.
Up until now, each of the indexes have only had their types defined in the mapping. The default options are currently being applied to every field. Options are a way to refine the index further based on your data to ultimately get more relevant search results. Let's play around with some of the options within the mappings of our index.
The 3000 characters is just a random number for this example, but adding a limit, depending on your use case, could improve performance or the index size.
In a future tutorial, we're going to explore the finer details in regards to what the search analyzers are and what they can accomplish.
These are just some of the available options for the string data type. Each data type will have its own set of options. If you want to use the default for any particular option, it does not need to be explicitly added to the mapped field.
You just received what was hopefully a gentle introduction to creating indexes to be used in Atlas Search. To use Atlas Search, you will need at least one index on your collection, even if it is a default dynamic index. However, if you know your schema and are able to create static mappings, it is usually the better way to go to fine-tune relevancy and performance.