BlogAnnounced at MongoDB.local NYC 2024: A recap of all announcements and updatesLearn more >>
MongoDB Developer
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right

MongoDB's Performance over RDBMS

Srinivas Mutyala6 min read • Published Feb 14, 2024 • Updated Feb 14, 2024
Facebook Icontwitter iconlinkedin icon
Rate this article
Someone somewhere might be wondering why we get superior performance with MongoDB over RDBMS databases. What is the secret behind it? I too had this question until I learned about the internal workings of MongoDB, especially data modeling, advanced index methods, and finally, how the WiredTiger storage engine works.
I wanted to share my learnings and experiences to reveal the secret of it so that it might be helpful to you, too.

Data modeling: embedded structure (no JOINs)

MongoDB uses a document-oriented data model, storing data in JSON-like BSON documents. This allows for efficient storage and retrieval of complex data structures.
MongoDB's model can lead to simpler and more performant queries compared to the normalization requirements of RDBMS.
Relational database tables with an arrow pointing to a document model
The initial phase of enhancing performance involves comprehending the query behaviors of your application. This understanding enables you to tailor your data model and choose suitable indexes to align with these patterns effectively.
Always remember MongoDB's optimized document size (which is 16 MB) so you can avoid embedding images, audio, and video files in the same collection, as depicted in the image below.
Illustration saying that embedding is better than referencing
Customizing your data model to match the query patterns of your application leads to streamlined queries, heightened throughput for insert and update operations, and better workload distribution across a sharded cluster.
While MongoDB offers a flexible schema, overlooking schema design is not advisable. Although you can adjust your schema as needed, adhering to schema design best practices from the outset of your project can prevent the need for extensive refactoring down the line.
A major advantage of BSON documents is that you have the flexibility to model your data any way your application needs. The inclusion of arrays and subdocuments within documents provides significant versatility in modeling intricate data relationships. But you can also model flat, tabular, and columnar structures, simple key-value pairs, text, geospatial and time-series data, or the nodes and edges of connected graph data structures. The ideal schema design for your application will depend on its specific query patterns.

How is embedding within collections in MongoDB different from storing in multiple tables in RDBMS?

An example of a best practice for an address/contact book involves separating groups and portraits information in a different collection because as they can go big due to n-n relations and image size, respectively. They may hit a 16 MB optimized document size.
Splitting groups and portraits into different collections and using referencing
Embedding data in a single collection in MongoDB (or minimizing the number of collections, at least) versus storing in multiple tables in RDBMS offers huge performance improvements due to the data locality which will reduce the data seeks, as shown in the picture below.
Diagram representing how data locality plays a part in faster data reads
Data locality is the major reason why MongoDB data seeks are faster.
Difference: tabular vs document
Steps to create the model1 - define schema. 2 - develop app and queries1 - identifying the queries 2- define schema
Initial schema3rd normal form. One possible solutionMany possible solutions
Final schemaLikely denormalizedFew changes
Schema evolutionDifficult and not optimal. Likely downtimeEasy. No downtime

WiredTiger’s cache and compression

WiredTiger is an open-source, high-performance storage engine for MongoDB. WiredTiger provides features such as document-level concurrency control, compression, and support for both in-memory and on-disk storage.
WiredTiger cache architecture: WiredTiger utilizes a sophisticated caching mechanism to efficiently manage data in memory. The cache is used to store frequently accessed data, reducing the need to read from disk and improving overall performance.
Memory management: The cache dynamically manages memory usage based on the workload. It employs techniques such as eviction (removing less frequently used data from the cache) and promotion (moving frequently used data to the cache) to optimize memory utilization.
Configuration: WiredTiger allows users to configure the size of the cache based on their system's available memory and workload characteristics. Properly sizing the cache is crucial for achieving optimal performance.
Durability: WiredTiger ensures durability by flushing modified data from the cache to disk. This process helps maintain data consistency in case of a system failure.
Data compression: WiredTiger supports data compression to reduce the amount of storage space required. Compressing data can lead to significant disk space savings and improved I/O performance.
Configurable compression: Users can configure compression options based on their requirements. WiredTiger supports different compression algorithms, allowing users to choose the one that best suits their workload and performance goals.
Trade-offs: While compression reduces storage costs and can improve read/write performance, it may introduce additional CPU overhead during compression and decompression processes. Users need to carefully consider the trade-offs and select compression settings that align with their application's needs.
Compatibility: WiredTiger's compression features are transparent to applications and don't require any changes to the application code. The engine handles compression and decompression internally.
Overall, WiredTiger's cache and compression features contribute to its efficiency and performance characteristics. By optimizing memory usage and providing configurable compression options, WiredTiger aims to meet the diverse needs of MongoDB users in terms of both speed and storage efficiency.
Few RDBMS systems also employ caching, but the performance benefits may vary based on the database system and configuration.

Advanced indexing capabilities

MongoDB, being a NoSQL database, offers advanced indexing capabilities to optimize query performance and support efficient data retrieval. Here are some of MongoDB's advanced indexing features:
Compound indexes
MongoDB allows you to create compound indexes on multiple fields. A compound index is an index on multiple fields in a specific order. This can be useful for queries that involve multiple criteria.
The order of fields in a compound index is crucial. MongoDB can use the index efficiently for queries that match the index fields from left to right.
Multikey indexes
MongoDB supports indexing on arrays. When you index an array field, MongoDB creates separate index entries for each element of the array.
Multikey indexes are helpful when working with documents that contain arrays, and you need to query based on elements within those arrays.
Text indexes
MongoDB provides text indexes to support full-text search. Text indexes tokenize and stem words, allowing for more flexible and language-aware text searches.
Text indexes are suitable for scenarios where users need to perform text search operations on large amounts of textual data.
Geospatial indexes
MongoDB supports geospatial indexes to optimize queries that involve geospatial data. These indexes can efficiently handle queries related to location-based information.
Geospatial indexes support 2D and 3D indexing, allowing for the representation of both flat and spherical geometries.
Wildcard indexes
MongoDB supports wildcard indexes, enabling you to create indexes that cover only a subset of fields in a document. This can be useful when you have specific query patterns and want to optimize for those patterns without indexing every field.
Partial indexes
Partial indexes allow you to index only the documents that satisfy a specified filter expression. This can be beneficial when you have a large collection but want to create an index for a subset of documents that meet specific criteria.
Hashed indexes
Hashed indexes are useful for sharding scenarios. MongoDB automatically hashes the indexed field's values and distributes the data across the shards, providing a more even distribution of data and queries.
TTL (time-to-live) indexes
TTL indexes allow you to automatically expire documents from a collection after a certain amount of time. This is helpful for managing data that has a natural expiration, such as session information or log entries.
These advanced indexing capabilities in MongoDB provide developers with powerful tools to optimize query performance for a wide range of scenarios and data structures. Properly leveraging these features can significantly enhance the efficiency and responsiveness of MongoDB databases.
Visualisation of index capabilities
In conclusion, the superior performance of MongoDB over traditional RDBMS databases stems from its adept handling of data modeling, advanced indexing methods, and the efficiency of the WiredTiger storage engine. By tailoring your data model to match application query patterns, leveraging MongoDB's optimized document structure, and harnessing advanced indexing capabilities, you can achieve enhanced throughput and more effective workload distribution.
Remember, while MongoDB offers flexibility in schema design, it's crucial not to overlook the importance of schema design best practices from the outset of your project. This proactive approach can save you from potential refactoring efforts down the line.
For further exploration and discussion on MongoDB and database optimization strategies, consider joining our Developer Community. There, you can engage with fellow developers, share insights, and stay updated on the latest developments in database technology.
Keep optimizing and innovating with MongoDB to unlock the full potential of your applications.

Facebook Icontwitter iconlinkedin icon
Rate this article

Quick Start: BSON Data Types - Decimal128

Sep 23, 2022 | 2 min read

Symfony and MongoDB Workshop: Building a Rental Listing Application

Apr 08, 2024 | 3 min read

Build a RESTful API with HapiJS and MongoDB

May 31, 2022 | 15 min read

How to Maintain Multiple Versions of a Record in MongoDB (2024 Updates)

May 16, 2024 | 6 min read
Table of Contents
  • Data modeling: embedded structure (no JOINs)