Docs Menu
Docs Home
/
MongoDB Atlas
/ / / /

How to Index Fields in Arrays of Objects and Documents

On this page

  • Review embeddedDocuments Type Limitations
  • Lucene's 2,147,483,647 Index Objects Limit
  • Atlas Search Limitations
  • Define the Index for the embeddedDocument Type
  • Configure embeddedDocument Field Properties
  • Try an Example for the embeddedDocument Type

Note

The Atlas Search embeddedDocuments type, embeddedDocument operator, and embedded scoring option are in preview. When an Atlas Search index on a replica set or single MongoDB shard reaches 2,147,483,647 index objects, Atlas Search transitions the index to a failed, non-queryable state. A solution to accommodate this limitation will be in place when this feature is generally available. To troubleshoot any issues related to using this feature, contact Support. To request support for more than 2,147,483,647 index objects, upvote this request in the MongoDB Feedback Engine.

You can use the Atlas Search embeddedDocuments type to index fields in documents and objects that are elements of an array. Atlas Search indexes embedded documents independent of their parent document. Each indexed document contains only fields that are part of the embedded document array element. You can use only the embeddedDocument operator to query fields indexed as embeddedDocuments type.

Note

Use the embeddedDocuments type to index fields inside array of documents so that you can query each nested document individually. If you only need to query nested documents in relation to the parent document, use the How to Index Fields in Objects and Documents type.

Atlas Search doesn't dynamically index fields of type embeddedDocument. You must use static mappings to index embeddedDocument fields. You can use the Visual Editor or the JSON Editor in the Atlas UI to index fields of type embeddedDocument.

Before you create an index that uses the embeddedDocuments type, review the Lucene's 2,147,483,647 index objects limit and other Atlas Search limitations.

Atlas Search doesn't support indexing more than 2,147,483,647 index objects on a replica set or single shard, where each indexed embedded document counts as a single object. Using the embeddedDocuments field type can result in indexing objects over this limit, which causes an index to transition to a failed, non-queryable state.

The exact number of index objects can vary based on the rate of document changes and deletions. The Search Max Number of Lucene Docs metric provides the upper bound of the current number of index objects across all indexes per replica set or shard. You can approximate the expected number of index objects in a single index by doing the following:

  1. Calculate the number of index objects per document. For every level of nesting, each embedded document counts as a separate index object.

    total number of index objects = 1 + number of nested embedded documents
  2. Multiply the number of index objects per document by the total number of documents in the collection

    total number of index objects x total number of documents in collection

Note that this approximation is a lower bound.

Example

Consider the collection named schools, described in this tutorial, and suppose the collection contains 1000 documents similar to the following:

{
"_id": 0,
"name": "Springfield High",
"mascot": "Pumas",
"teachers": [
{
"first": "Jane",
"last": "Smith",
"classes": [
{
"subject": "art of science",
"grade": "12th"
},
... // 2 more embedded documents
]
},
... // 1 more embedded document
],
"clubs": {
"stem": [
{
"club_name": "chess",
"description": "provides students opportunity to play the board game of chess informally and competitively in tournaments."
},
... // 1 more embedded document
],
... // 1 more embedded document
}
}

Now consider the index definition for the following fields in the schools collection:

The array of documents named teachers is indexed as the embeddedDocuments type with dynamic mappings enabled. However, the classes field isn't indexed. Use the following to calculate the index objects:

  1. Calculate the number of index objects per document.

    Number of ``teachers`` embedded documents = up to 2
    Total number of index objects per document = 1 + 2 = 3
  2. Multiply by the total number of documents in the collection.

    Number of documents in the collection = 1000
    Number of index objects per document = 3
    Total number of index objects for collection = 1000 x 3 = 3000

The arrays of documents named teachers and teachers.classes are indexed as the embeddedDocuments type with dynamic mappings enabled. Use the following to calculate the index objects:

  1. Calculate the number of index objects per document:

    Number of documents = 1
    Number of ``teachers`` embedded documents = up to 2
    Number of ``classes`` embedded documents = up to 3
    Number of index objects per document = 1 + ( 2 x 3 ) = 7
  2. Multiply by the total number of documents in the collection.

    Number of documents in the collection = 1000
    Number of index objects per document = 7
    Total number of index objects: 1000 x 7 = 7000

If your collection has large arrays that might generate 2,147,483,647 index objects, you must shard any clusters that contain indexes with the embeddedDocuments type.

The following limitations apply:

  • You can use embeddedDocuments only on fields with up to 5 levels of nesting. An embeddedDocuments field can't have more than 4 parent embeddedDocuments fields.

  • You can't use embeddedDocuments for date or numeric faceting.

  • You can't define a field inside the embeddedDocuments type as the knnVector type.

  • You can't index children of fields indexed as the embeddedDocuments type as the token type.

  • For highlighting fields within embedded documents, you must also index the parent of the field that you want to highlight as the document type.

  • You can do the following only if you index the parents of the embedded document child field as the document type:

    • Faceted search on string fields within embedded documents. You must also index the field that you want to facet on as the stringFacet type.

      Note

      When you facet on a string field inside embedded documents, Atlas Search returns facet count for only the number of matching parent documents.

      You can't facet on numeric and date fields in embedded documents.

    • Highlight fields within embedded documents. For an example, see the How to Run Atlas Search Queries Against Objects in Arrays tutorial.

    • Sort by the parent of the embedded document field. You must also index the embedded document field with string values as the token type. For child fields with number and date values, enable dynamic mapping to index those fields automatically. For an example, see Sort Example.

To define the index for the embeddedDocument type, choose your preferred configuration method in the Atlas UI and then select the database and collection.

  1. Click Refine Your Index to configure your index.

  2. In the Field Mappings section, click Add Field to open the Add Field Mapping window.

  3. Click Customized Configuration.

  4. Select the field to index from the Field Name dropdown.

    Note

    You can't index fields that contain the dollar ($) sign at the start of the field name.

  5. Click the Data Type dropdown and select EmbeddedDocument.

  6. Toggle the Enable Dynamic Mapping setting to enable or disable dynamic indexing of all dynamically indexable fields in the document. To learn more, see Configure document Field Properties.

  7. Click Add.

  8. If you disabled dynamic mapping, click Add Embedded Field for the EmbeddedDocument type field to define field mappings for the fields in the document.

The following is the JSON syntax for the embeddedDocument type. Replace the default index definition with the following. To learn more about the fields, see Field Properties.

1{
2 "mappings": {
3 "dynamic": true|false,
4 "fields": {
5 "<field-name>": {
6 "type": "embeddedDocuments",
7 "dynamic": true|false,
8 "fields": {
9 "<field-name>": {
10 <field-mapping-definition>
11 }
12 }
13 }
14 }
15 }
16}

The Atlas Search embeddedDocuments type takes the following parameters:

Field
Type
Necessity
Description
Default
type
string
Required
Human-readable label that identifies the field type. Value must be embeddedDocuments.
dynamic
boolean
Optional

Flag that specifies whether to index every dynamically indexable field in the document. Value can be one of the following:

  • true - index all indexable fields.

  • false - don't index all the indexable fields.

false
fields
document
Optional

Fields to index.

If dynamic is true, Atlas Search indexes all indexable fields.

If dynamic is false, you can specify the fields to index in the field definition for fields.

Atlas Search doesn't support indexing facet fields as part of an embeddedDocuments field.

{}

The following index definition example uses the sample_supplies.sales collection. If you have the sample data already loaded on your cluster, you can use the Visual Editor or JSON Editor in the Atlas UI to configure the index. After you select your preferred configuration method, select the database and collection, and refine your index to add field mappings.

The following index definition indexes the array of objects in the items field. It also configures Atlas Search to automatically index all dynamically indexable fields inside the objects in the items array.

  1. In the Add Field Mapping window, select items from the Field Name dropdown.

  2. Click the Data Type dropdown and select EmbeddedDocuments.

  3. Toggle Enable Dynamic Mapping to enable dynamic mapping, if needed.

  4. Click Add.

Replace the default index definition with the following index definition.

1{
2 "mappings": {
3 "fields": {
4 "items": {
5 "type": "embeddedDocuments",
6 "dynamic": true
7 }
8 }
9 }
10}

Note

To index all fields in an embedded document including fields that Atlas Search doesn't dynamically index, define the fields in the index definition. For string faceting, Atlas Search counts string facets once for each document in the result set.

For example, the following index definition configures Atlas Search to automatically index all dynamically indexable fields inside the objects in the items array. It also configures the purchaseMethod field inside the array of objects to be indexed as stringFacet, which Atlas Search doesn't dynamically index, to support Atlas Search facet queries against that field:

Click Add Field in the Field Mappings section and add the following fields by clicking Add after configuring the settings for each field in the Add Field Mapping window.

Field Name
Data Type
items
Click the dropdown and select EmbeddedDocuments.
purchaseMethod
Click the dropdown and select StringFacet.
1{
2 "mappings": {
3 "dynamic": true,
4 "fields": {
5 "items": {
6 "dynamic": true,
7 "type": "embeddedDocuments"
8 },
9 "purchaseMethod": {
10 "type": "stringFacet"
11 }
12 }
13 }
14}

The following index definition configures Atlas Search to index only the name and tags fields as the Atlas Search string type in the items array of objects.

  1. In the Add Field Mapping window, select items from the Field Name dropdown.

  2. Click the Data Type dropdown and select EmbeddedDocuments.

  3. Disable Enable Dynamic Mapping.

  4. Click Add.

  5. Click Add Embedded Field for the items field in the Field Mappings table and add the following fields by clicking Add after configuring the settings for each field in the Add Embedded Field Mapping window.

    Field Name
    Data Type
    items.name
    Click the Data Type dropdown and select String.
    items.tags
    Click the Data Type dropdown and select String.

Replace the default index definition with the following index definition.

1{
2 "mappings": {
3 "fields": {
4 "items": {
5 "type": "embeddedDocuments",
6 "dynamic": false,
7 "fields": {
8 "name": {
9 "type": "string"
10 },
11 "tags": {
12 "type": "string"
13 }
14 }
15 }
16 }
17 }
18}

Tip

See also: Additional Index Definition Examples

Back

document