Embedding MongoDB

The flexibility to define your schema in whatever way works for any given application is a defining characteristic of document databases like MongoDB, and nesting documents inside each other is a key technique for creating optimal schemas. Rather than constructing your application to match a strictly defined data model, you can construct a data model that matches your use case and application functionality.




What are embedded documents in MongoDB?

In a relational database, you store each individual entity in its own table, and link them together through foreign keys. While MongoDB certainly supports references from one document to another, and even multi-document joins, it’s a mistake to use a document database the same way you use a relational one.

For example, let’s look at a simple structure with a user, and their addresses. One way to structure the relationship between the two entities is to use references:

> db.user.findOne()
{
    _id: 111111,
    email: “email@example.com”,
    name: {given: “Jane”, family: “Han”},
}

> db.address.find({user_id: 111111})
{
    _id: 121212,
    street: “111 Elm Street”,
    city: “Springfield”,
    state: “Ohio”,
    country: “US”,
    zip: “00000”
}

However, if this address is only ever accessed in relation to this one user, or needed frequently at the same time as the user, it’s much simpler to just embed the address document in the user document, like so:


> db.user.findOne({_id: 111111})

{    
    _id: 111111,    
    email: “email@example.com”,    
    name: {given: “Jane”, family: “Han”},    
    address: {    
    street: “111 Elm Street”,    
    city: “Springfield”,    
    state: “Ohio”,    
    country: “US”,    
    zip: “00000”,    
    }    
}

Now, rather than having to do a separate query against the address collection to retrieve Jane Han’s address, you can just access it as a sub-document of her user record.


Storing multiple addresses is similarly simple:

> db.user.findOne({_id: 111111})

{    
    _id: 111111,    
    email: “email@example.com”,    
    name: {given: “Jane”, family: “Han”},    
    addresses: [    
    {    
    label: “Home”,    
    street: “111 Elm Street”,    
    city: “Springfield”,    
    state: “Ohio”,    
    country: “US”,    
    zip: “00000”,    
    },    
    {label: “Work”, ...}    
    ]    
}

You can even update individual addresses, using the positional operator:

> db.user.update(    
    {_id: 111111,    
    “addresses.label”: “Home”},    
    {$set: {“addresses.$.street”: “112 Elm Street”}}    
    )

Note that you need to wrap any query or update key that contains a dot (like “address.label”) in quotes for it to be syntactically correct. The query part of the update needs to include the array field you’re updating, and then the update will apply to the first element that matches in the array.




Why (and when) you should prefer embedding to referencing

Embedded documents are an efficient and clean way to store related data, especially data that’s regularly accessed together. In general, when designing schemas for MongoDB, you should prefer embedding by default, and use references and application-side or database-side joins only when they’re worthwhile. The more often a given workload can retrieve a single document and have all the data it needs, the more consistently high-performance your application will be.


There are a few different patterns for embedding:

The Embedded Document Pattern

This is the general pattern of preferring to embed even complex sub-structures in the documents they’re used with. The typical rule is:

What you use together, store together.




The Embedded Subset Pattern

A hybrid case, the subset pattern comes into play when you have a separate collection for a potentially very long list of related items, but you want to keep some of those items easily at hand for display to the user. Let’s look at an example:

> db.movie.findOne()

{    
    _id: 333333,    
    title: “The Big Lebowski”
}

> db.review.find({movie_id: 333333})

{    
    _id: 454545,    
    movie_id: 333333,    
    stars: 3    
    text: “it was OK”    
}    

{    
    _id: 565656,    
    movie_id: 333333,    
    stars:5,    
    text: “the best”    
}    
...

Now imagine you have thousands of reviews, but you always show the most recent two when you display a movie. In this case, it makes sense to store that subset as a list on the movie document.
> db.movie.findOne({_id: 333333})

{    
    _id: 333333,    
    title: “The Big Lebowski”,    
    recent_reviews: [    
    {_id: 454545, stars: 3, text: “it was OK”},    
    {_id: 565656, stars: 5, text: “the best”}    
    ]    
}

If you regularly access a subset of related items, embed that subset.




The Extended Reference Pattern

Another hybrid case is called the Extended Reference. It’s somewhat like the subset pattern, in that it optimizes for a small amount of information that is regularly accessed to be stored on the document where it’s needed. In this case, rather than a list, it’s used when one document refers to another that is in its own collection, but also stores some fields from that other document for ease of access.

Example:

> db.movie.findOne({_id: 444444})

{    
    _id: 444444,    
    title: “One Flew Over the Cuckoo's Nest”,    
    studio_id: 999999,    
    studio_name: “Fantasy Films”    
}

As you can see, the studio_id is stored so you can look up more info on the studio that made the film, but the studio’s name is also copied to this document for ease of display. Note that if you’re embedding information from documents that change regularly, you need to remember to update documents where you’ve copied that information when it does change.

If you regularly access a few fields from a referenced document, embed those fields.




Unbounded Lists

Storing short lists of related information in the document they belong to makes tremendous sense, but if your list can grow unchecked, putting it in a single document is not only unwise, it’s untenable! MongoDB has a limit to the size of a single document, for one - but also, if the document is accessed with any frequency, you’ll start to see negative impact from excessive memory usage.

If a list can grow in an unbounded way, put it in its own collection.




Independent Access

Another time you should store sub-documents in their own collection is when you want to access them independently of the parent document you would otherwise nest them in. For instance, consider a product made by a company. If the company only sells a handful of products, you might want to store them as part of the company document. If, however, you want to access those products directly by SKU, or reuse them across companies, you’d want to store them in their own collection as well.

If you regularly access or manipulate an entity independently, put it in its own collection.




See also

Embedding MongoDB Charts in custom applications.

Integration testing with Spring Boot and Embedded MongoDB



Ready to get started?

Try MongoDB in the cloud for free with MongoDB Atlas. No credit card required.