Navigation
This version of the documentation is archived and no longer supported.

Data Modeling Introduction

The key challenge in data modeling is balancing the needs of the application, the performance characteristics of the database engine, and the data retrieval patterns. When designing data models, always consider the application usage of the data (i.e. queries, updates, and processing of the data) as well as the inherent structure of the data itself.

Flexible Schema

Unlike SQL databases, where you must determine and declare a table’s schema before inserting data, MongoDB’s collections, by default, do not require their documents to have the same schema. That is:

  • The documents in a single collection do not need to have the same set of fields and the data type for a field can differ across documents within a collection.
  • To change the structure of the documents in a collection, such as add new fields, remove existing fields, or change the field values to a new type, update the documents to the new structure.

This flexibility facilitates the mapping of documents to an entity or an object. Each document can match the data fields of the represented entity, even if the document has substantial variation from other documents in the collection.

In practice, however, the documents in a collection share a similar structure, and you can enforce document validation rules for a collection during update and insert operations. See Schema Validation for details.

Document Structure

The key decision in designing data models for MongoDB applications revolves around the structure of documents and how the application represents relationships between data. MongoDB allows related data to be embedded within a single document.

Embedded Data

Embedded documents capture relationships between data by storing related data in a single document structure. MongoDB documents make it possible to embed document structures in a field or array within a document. These denormalized data models allow applications to retrieve and manipulate related data in a single database operation.

Data model with embedded fields that contain all related information.

For many use cases in MongoDB, the denormalized data model is optimal.

See Embedded Data Models for the strengths and weaknesses of embedding documents.

References

References store the relationships between data by including links or references from one document to another. Applications can resolve these references to access the related data. Broadly, these are normalized data models.

Data model using references to link documents. Both the ``contact`` document and the ``access`` document contain a reference to the ``user`` document.

See Normalized Data Models for the strengths and weaknesses of using references.

Atomicity of Write Operations

Single Document Atomicity

In MongoDB, a write operation is atomic on the level of a single document, even if the operation modifies multiple embedded documents within a single document.

A denormalized data model with embedded data combines all related data in a single document instead of normalizing across multiple documents and collections. This data model facilitates atomic operations.

For details regarding transactions in MongoDB, see the Transactions page.

Multi-Document Transactions

When a single write operation (e.g. db.collection.updateMany()) modifies multiple documents, the modification of each document is atomic, but the operation as a whole is not atomic.

When performing multi-document write operations, whether through a single write operation or multiple write operations, other operations may interleave.

For situations that require atomicity of reads and writes to multiple documents (in a single or multiple collections), MongoDB supports multi-document transactions:

  • In version 4.0, MongoDB supports multi-document transactions on replica sets.
  • In version 4.2, MongoDB introduces distributed transactions, which adds support for multi-document transactions on sharded clusters and incorporates the existing support for multi-document transactions on replica sets.

For details regarding transactions in MongoDB, see the Transactions page.

Important

In most cases, multi-document transaction incurs a greater performance cost over single document writes, and the availability of multi-document transactions should not be a replacement for effective schema design. For many scenarios, the denormalized data model (embedded documents and arrays) will continue to be optimal for your data and use cases. That is, for many scenarios, modeling your data appropriately will minimize the need for multi-document transactions.

For additional transactions usage considerations (such as runtime limit and oplog size limit), see also Production Considerations.

Data Use and Performance

When designing a data model, consider how applications will use your database. For instance, if your application only uses recently inserted documents, consider using Capped Collections. Or if your application needs are mainly read operations to a collection, adding indexes to support common queries can improve performance.

See Operational Factors and Data Models for more information on these and other operational considerations that affect data model designs.

Learn More

MongoDB.live 2020 Presentations

To learn how to incorporate the flexible data model into your schema, see the following presentations from MongoDB.live 2020:

MongoDB University

Learn how to structure documents and define your schema in MongoDB University’s M320 Data Modeling course.

Application Modernization Guide

For more information on data modeling with MongoDB, download the MongoDB Application Modernization Guide.

The download includes the following resources:

  • Presentation on the methodology of data modeling with MongoDB
  • White paper covering best practices and considerations for migrating to MongoDB from an RDBMS data model
  • Reference MongoDB schema with its RDBMS equivalent
  • Application Modernization scorecard