What is MongoDB?
MongoDB is the database for today's applications, enabling you to:
Leverage data and technology to maximize competitive advantage
Reduce risk for mission-critical deployments
Dramatically lower total cost of ownership
With MongoDB, you can build applications that were never possible with traditional relational databases. Here's how.
Fast, Iterative Development. Scope creep and changing business requirements no longer stand between you and successful project delivery. A flexible data model coupled with dynamic schema and idiomatic drivers make it fast for developers to build and evolve applications. Automated provisioning and management enable continuous integration and highly productive operations. Contrast this against static relational schemas and complex operations that have hindered you in the past.
Flexible Data Model. MongoDB's document data model makes it easy for you to store and combine data of any structure, without giving up sophisticated validation rules, data access and rich indexing functionality. You can dynamically modify the schema without downtime. You spend less time prepping your data for the database, and more time putting your data to work.
Multi-Datacenter Scalability. MongoDB can be scaled within and across geographically distributed data centers, providing new levels of availability and scalability. As your deployments grow in terms of data volume and throughput, MongoDB scales easily with no downtime, and without changing your application. And as your availability and recovery goals evolve, MongoDB lets you adapt flexibly, across data centers, with tunable consistency.
Integrated Feature Set. Analytics and data visualization, text search, graph processing, geospatial, in-memory performance and global replication allow you to deliver a wide variety of real-time applications on one technology, reliably and securely. RDBMS systems require additional, complex technologies demanding separate integration overhead and expense to do this well.
Lower TCO. Application development teams are more productive when they use MongoDB. Single click management means operations teams are as well. MongoDB runs on commodity hardware, dramatically lowering costs. Finally, MongoDB offers affordable annual subscriptions, including 24x7x365 global support. Your applications can be one tenth the cost to deliver compared to using a relational database.
Long-Term Commitment. MongoDB Inc and the MongoDB ecosystem stand behind the world's fastest-growing database. 20M+ downloads and 2,000+ customers including more than 50% of the Fortune 100. Over 1,000 partners and greater investor funding than any other database in history. You can be sure your investment is protected.
The Nexus Architecture
MongoDB’s design philosophy is focused on combining the critical capabilities of relational databases with the innovations of NoSQL technologies. Our vision is to leverage the work that Oracle and others have done over the last 40 years to make relational databases what they are today. Rather than discard decades of proven database maturity, MongoDB is picking up where they left off by combining key relational database capabilities with the work that Internet pioneers have done to address the requirements of modern applications.
Relational databases have reliably served applications for many years, and offer features that remain critical today as developers build the next generation of applications:
Expressive query language & secondary Indexes. Users should be able to access and manipulate their data in sophisticated ways to support both operational and analytical applications. Indexes play a critical role in providing efficient access to data, supported natively by the database rather than maintained in application code.
Strong consistency. Applications should be able to immediately read what has been written to the database. It is much more complex to build applications around an eventually consistent model, imposing significant work on the developer, even for the most sophisticated engineering teams.
Enterprise Management and Integrations. Databases are just one piece of application infrastructure, and need to fit seamlessly into the enterprise IT stack. Organizations need a database that can be secured, monitored, automated, and integrated with their existing technology infrastructure, processes, and staff, including operations teams, DBAs, and data analysts.
However, modern applications impose requirements not addressed by relational databases, and this has driven the development of NoSQL databases which offer:
Flexible Data Model. NoSQL databases emerged to address the requirements for the data we see dominating modern applications. Whether document, graph, key-value, or wide-column, all of them offer a flexible data model, making it easy to store and combine data of any structure and allow dynamic modification of the schema without downtime or performance impact.
Scalability and Performance. NoSQL databases were all built with a focus on scalability, so they all include some form of sharding or partitioning. This allows the database to scale out on commodity hardware deployed on-premises or in the cloud, enabling almost unlimited growth with higher throughput and lower latency than relational databases.
Always-On Global Deployments. NoSQL databases are designed for highly available systems that provide a consistent, high quality experience for users all over the world. They are designed to run across many nodes, including replication to automatically synchronize data across servers, racks, and data centers.
While offering these innovations, NoSQL systems have sacrificed the critical capabilities that people have come to expect and rely upon from relational databases. MongoDB offers a different approach. With its Nexus Architecture, MongoDB is the only database that harnesses the innovations of NoSQL while maintaining the foundation of relational databases.
Want to go deeper into MongoDB's technology? Then read on for key highlights, or download our detailed Architecture Guide.
MongoDB Data Model
Back to Table of Contents
This section covers 4 topics: Multimodel Architecture, Data as Documents, Dynamic Schemas and Schema Management
MongoDB Multimodel Architecture
MongoDB uniquely allows users to mix and match multiple storage engines within a single deployment. This flexibility provides a more simple and reliable approach to meeting diverse application needs for data. Traditionally, multiple database technologies would need to be managed to meet these needs, with complex, custom integration code to move data between the technologies, and to ensure consistent, secure access. With MongoDB’s flexible storage architecture, the database automatically manages the movement of data between storage engine technologies using native replication.
MongoDB’s flexible document data model presents a superset of other database models. It allows data to be represented as simple key-value pairs and flat, table-like structures, through to rich documents and objects with deeply nested arrays and sub-documents.
With an expressive query language, documents can be queried in many ways – from simple lookups to creating sophisticated processing pipelines for data analytics and transformations, through to faceted search, JOINs and graph traversals.
With a flexible storage architecture, application owners can deploy storage engines optimized for different workload and operational requirements.
MongoDB’s multimodel design significantly reduces developer and operational complexity when compared to running multiple distinct database technologies to meet different applications needs. Users can leverage the same MongoDB query language, data model, scaling, security, and operational tooling across different parts of their application, with each powered by the optimal storage engine.
Data as Documents
MongoDB stores data as documents in a binary representation called BSON (Binary JSON). Documents that share a similar structure are typically organized as collections. You can think of collections as being analogous to a table in a relational database: documents are similar to rows, and fields are similar to columns.
MongoDB documents tend to have all data for a given record in a single document, whereas in a relational database information for a given record is usually spread across many tables.
For example, consider the data model for a blogging application. In a relational database, the data model would comprise multiple tables such as Categories, Tags, Users, Comments and Articles. In MongoDB the data could be modeled as two collections, one for users, and the other for articles. In each blog document there might be multiple comments, multiple tags, and multiple categories, each expressed as an embedded array.
Data as documents: simpler for developers, faster for users.
As a result of the document model, data in MongoDB is more localized, which dramatically reduces the need to JOIN separate tables. The result is dramatically higher performance and scalability across commodity hardware as a single read to the database can retrieve the entire document.
Unlike many NoSQL databases, users don’t need to give up JOINs entirely. For additional analytics flexibility, MongoDB preserves left-outer JOIN semantics with the $lookup operator, enabling users to get the best of both relational and non-relational data modeling.
In addition, MongoDB documents are more closely aligned to the structure of objects in the programming language. This makes it simpler and faster for developers to model how data in the application will map to data stored in the database.
MongoDB Dynamic Schema with Data Governance Control
MongoDB documents can vary in structure. For example, all documents that describe users might contain the user id and the last date they logged into the system, but only some of these documents might contain the user's identity for one or more third-party applications.
Fields can vary from document to document; there is no need to declare the structure of documents to the system – documents are self-describing. If a new field needs to be added to a document then the field can be created without affecting all other documents in the system, without updating a central system catalog, and without taking the system offline.
MongoDB enables developers to design and evolve the schema through an iterative and agile approach, while enforcing data governance.
Developers can start writing code and persist the objects as they are created. And when developers add more features, MongoDB continues to store the updated objects without the need for performing costly ALTER_TABLE operations, or worse - having to re-design the schema from scratch.
Dynamic schemas bring great agility, but it is also important that controls can be implemented to maintain data quality. Unlike NoSQL databases that push enforcement of these controls back into application code, MongoDB provides document validation within the database. Users can enforce checks on document structure, data types, data ranges and the presence of mandatory fields. As a result, DBAs can apply data governance standards, while developers maintain the benefits of a flexible document model.
How does the MongoDB data model stack up to relational databases and key-value stores? Take a look at the chart below:
|Rich Data Model||Yes||No||No|
|Easy for Programmers||Yes||No||Not when modeling complex data structures|
MongoDB Compass is included with both MongoDB Professional and MongoDB Enterprise Advanced subscriptions used with your self-managed instances, or hosted MongoDB Atlas instances. MongoDB Compass is free to use for evaluation and in development environments.
MongoDB Query Model and Data Visualization
Back to Table of Contents
This section covers 4 topics: Idiomatic Drivers, Query Types, Data Visualization, and Indexing
With the intuitive document data model, dynamic schema and idiomatic drivers, you can build applications and get to market faster with MongoDB.
Unlike NoSQL databases, MongoDB is not limited to simple Key-Value operations. Developers can build rich applications using complex queries, aggregations and secondary indexes that unlock the value in structured, semi-structured and unstructured data.
A key element of this flexibility is MongoDB's support for many types of queries. A query may return a document, a subset of specific fields within the document or complex aggregations and transformation of many documents:
Key-value queries return results based on any field in the document, often the primary key.
Range queries return results based on values defined as inequalities (e.g. greater than, less than or equal to, between).
Geospatial queries return results based on proximity criteria, intersection and inclusion as specified by a point, line, circle or polygon.
Search queries return results in relevance order and in faceted groups, based on text arguments using Boolean operators (e.g., `AND`, `OR`, `NOT`), and through bucketing, grouping and counting of query results. With support for collations, data comparison and sorting order can be defined for over 100 different languages and locales.
Aggregation Framework queries return aggregations and transformations of values returned by the query (e.g., count, min, max, average, similar to a SQL GROUP BY statement).
JOINs and graph traversals. Through the $lookup stage of the aggregation pipeline, documents from separate collections can be combined through a left outer JOIN operation. $graphLookup brings native graph processing within MongoDB, enabling efficient traversals across trees, graphs and hierarchical data to uncover patterns and surface previously unidentified connections.
Additionally the MongoDB Connector for Apache Spark exposes Spark’s Scala, Java, Python, and R libraries. MongoDB data is materialized as DataFrames and Datasets for analysis through machine learning, graph, streaming, and SQL APIs.
Using the MongoDB Connector for BI, included with MongoDB Enterprise Advanced, modern application data can be easily analyzed with industry-standard SQL-based BI and analytics platforms. Business analysts and data scientists can seamlessly analyze semi and unstructured data managed in MongoDB, alongside traditional data in their SQL databases using the same BI tools deployed within millions of enterprises.
Indexes are a crucial mechanism for optimizing system performance and scalability while providing flexible access to your data. MongoDB includes support for many types of secondary indexes that can be declared on any field in the document, including fields within arrays:
You can define compound, unique, array, partial, TTL, geospatial, sparse, hash and text indexes to optimize for multiple query patterns, multi-structured data types and constraints.
Index intersection enables MongoDB to use more than one index to optimize an ad-hoc query at run-time.
How does the MongoDB query and indexing model stack up to relational databases and key-value stores? Take a look at the chart below:
|Aggregation and Transformation||Yes||Yes||No|
|Left Outer JOINs ($Lookup)||Yes||Yes||No|
|Graph Processing ($graphLookup)||Yes||No||No|
To learn more about the differences in data models, download our Relational Database to MongoDB Migration Guide.
MongoDB Data Management
Back to Table of Contents
Auto-sharding for linear scalability
MongoDB provides horizontal scale-out for databases on low cost, commodity hardware using a technique called sharding, which is transparent to applications. Sharding distributes data across multiple physical partitions called shards. Sharding allows MongoDB deployments to address the hardware limitations of a single server, such as bottlenecks in RAM or disk I/O, without adding complexity to the application. MongoDB automatically balances the data in the cluster as the data grows or the size of the cluster increases or decreases.
Sharding is transparent to applications; whether there is one or one hundred shards, the application code for querying MongoDB is the same.
Unlike relational databases, sharding is automatic and built into the database. Developers don't face the complexity of building sharding logic into their application code, which then needs to be updated as shards are migrated. Operations teams don't need to deploy additional clustering software to manage process and data distribution.
Unlike other distributed databases, multiple sharding policies are available that enable developers and administrators to distribute data across a cluster according to query patterns or data locality. As a result, MongoDB delivers much higher scalability across a diverse set of workloads:
Range Sharding. Documents are partitioned across shards according to the shard key value. Documents with shard key values close to one another are likely to be co-located on the same shard. This approach is well suited for applications that need to optimize range based queries.
Hash Sharding. Documents are distributed according to an MD5 hash of the shard key value. This approach guarantees a uniform distribution of writes across shards, but is less optimal for range-based queries.
Zone Sharding. Provides the the ability for DBAs and operations teams to define specific rules governing data placement in a sharded cluster. Zones accommodate a range of deployment scenarios – for example locating data by geographic region, by hardware configuration for tiered storage architectures, or by application feature. Administrators can continuously refine data placement rules by modifying shard key ranges, and MongoDB will automatically migrate the data to its new zone.
How do the MongoDB scaling capabilities stack up to relational databases and key-value stores? Take a look at the chart below:
|Scale-out Commodity Hardware||Yes||No||Yes|
|Shard by Hash||Yes||Manual||Yes|
|Shard by Range||Yes||Manual||No|
|Shard by Zone||Yes||Manual||No|
|Automatic Data Rebalancing||Yes||Manual||Limited|
MongoDB scales like crazy. Whether you are sharding to scale data volume, performance or cross-data center operations, you can do it with MongoDB.
Pluggable storage architecture for application flexibility
With MongoDB, organizations can address diverse application needs, hardware resources, and deployment designs with a single database technology. Through the use of a pluggable storage architecture, MongoDB can be extended with new capabilities, and configured for optimal use of specific hardware architectures. This approach significantly reduces developer and operational complexity compared to running multiple databases to power applications with unique requirements. Users can leverage the same MongoDB query language, data model, scaling, security and operational tooling across different applications, each powered by different pluggable MongoDB storage engines.
MongoDB ships with four supported storage engines, all of which can coexist within a single MongoDB replica set. This makes it easy to evaluate and migrate between them, and to optimize for specific application requirements – for example combining the in-memory engine for ultra-low latency operations with a disk-based engine for persistence. The supported storage engines include:
The default WiredTiger storage engine. For many applications, WiredTiger's granular concurrency control and native compression will provide the best all round performance and storage efficiency for the broadest range of applications.
The Encrypted storage engine protecting highly sensitive data, without the performance or management overhead of separate filesystem encryption. (Requires MongoDB Enterprise Advanced)
The In-Memory storage engine delivering the extreme performance coupled with real time analytics for the most demanding, latency-sensitive applications. (Requires MongoDB Enterprise Advanced)
The MMAPv1 engine, an improved version of the original storage engine used in pre-3.x MongoDB releases.
Storage & network efficiency with compression
MongoDB supports native compression when configured with the WiredTiger or Encrypted storage engines, reducing physical storage footprint by as much as 80%. In addition to reduced storage space, compression enables much higher storage I/O scalability as fewer bits are read from disk. Administrators have the flexibility to configure specific compression algorithms for collections, indexes and the journal.
As a distributed database, MongoDB relies on efficient network transport during query routing and inter-node replication. MongoDB provides the option to compress the wire protocol used for intra-cluster communications. Based on the snappy compression algorithm, network traffic can be compressed by up to 70%, providing major performance benefits in bandwidth-constrained environments, and reduced networking costs.
MongoDB Consistency & Availability
Back to Table of Contents
This section covers 4 topics: Transaction Model, Replica Sets, In-Memory Performance, and Security.
MongoDB provides ACID properties at the document level. One or more fields may be written in a single operation, including updates to multiple sub-documents and elements of an array. The ACID guarantees provided by MongoDB ensure complete isolation as a document is updated; any errors cause the operation to roll back so that clients receive a consistent view of the document.
Developers can use MongoDB's Write Concerns to configure operations to commit to the application only after they have been flushed to the journal file on disk.This is the same model used by many traditional relational databases to provide durability guarantees. As a distributed system, MongoDB presents additional flexibility that helps users to achieve their desired availability SLAs. Each query can specify the appropriate write concern, such as writing to at least two replicas in one data center and one replica in a second data center.
MongoDB maintains multiple copies of data called replica sets using native replication. A replica set is a fully self-healing shard that helps prevent database downtime.Replica failover is fully automated, eliminating the need for administrators to intervene manually.
The number of replicas in a MongoDB replica set is configurable: a larger number of replicas provide increased data availability and protection against database downtime (e.g., in case of multiple machine failures, rack failures, data center failures, or network partitions). Optionally, operations can be configured to write to multiple replicas before returning to the application, thereby providing functionality that is similar to synchronous replication.
“MongoDB replica sets deliver fault tolerance and disaster recovery. Multi-data center awareness enables global data distribution and separation between operational and analytical workloads.
Replica sets also provide operational flexibility by providing a way to upgrade hardware and software without requiring the database to go offline.”
In-memory performance with on-disk capacity
With the In-Memory storage engine, MongoDB users can realize the performance advantages of in-memory computing for operational and real-time analytics workloads. The In-Memory storage engine delivers the extreme throughput and predictable latency demanded by the most performance-intensive applications in AdTech, finance, telecoms, IoT, eCommerce and more, eliminating the need for separate caching layers.
MongoDB replica sets allow for hybrid in-memory and on-disk database deployments. Data managed by the In-Memory engine can be processed and analyzed in real time, before being automatically replicated to MongoDB instances configured with one of the persistent disk-based storage engines. Lengthy ETL cycles typical when moving data between different databases are avoided, and users no longer have to trade away the scalable capacity or durability guarantees offered by disk storage.
To learn more, download our detailed Architecture Guide.
The frequency and severity of data breaches continues to escalate. Industry analysts predict cybercrime will cost the global economy $6 trillion annually by 2021. Organizations face an onslaught of new threat classes and threat actors with phishing, ransomware and intellectual property theft growing more than 50% year on year, and key infrastructure subject to increased disruption. With databases storing an organization’s most important information assets, securing them is top of mind for administrators.
MongoDB Enterprise Advanced features extensive capabilities to defend, detect and control access to data.”
Authentication. Simplifying access control to the database, MongoDB offers integration with external security mechanisms including LDAP, Windows Active Directory, Kerberos and x.509 PKI certificates.
Authorization. User-defined roles enable administrators to configure granular permissions for a user or an application based on the privileges they need to do their job. These can be defined in MongoDB, or centrally within an LDAP server. Additionally, administrators can define views that expose only a subset of data from an underlying collection, i.e. a view that filters or masks specific fields, such as Personally Identifiable Information (PII) from customer data or health records.
Auditing. For regulatory compliance, security administrators can use MongoDB's native audit log to track access and operations performed against the database.
Encryption. MongoDB data can be encrypted on the network, on disk and in backups. With the Encrypted storage engine, protection of data at-rest is an integral feature within the database. By natively encrypting database files on disk, administrators eliminate both the management and performance overhead of external encryption mechanisms. Only those staff who have the appropriate database authorization credentials can access the encrypted data, providing additional levels of defence.
To learn more, download our MongoDB Security Reference Architecture.
Management & Operations
Back to Table of Contents
This section covers 6 topics: Ops Manager & Cloud Manager, Deployments and Upgrades, Monitoring, Disaster Recovery, Integration, and Cost Savings.
Ops Manager is the simplest way to run MongoDB on your own infrastructure, making it easy for operations teams to deploy, monitor, backup and scale MongoDB. Many of the capabilities of Ops Manager are also available in MongoDB Cloud Manager, a tool hosted by MongoDB in the cloud. Ops Manager and Cloud Manager provides an integrated suite of applications that manage the complete lifecycle of the database:
Automated deployment and management with a single click and zero-downtime upgrades
Proactive monitoring providing visibility into the performance of MongoDB, history, and automated alerting on 100+ system metrics
Disaster recovery with continuous, incremental backup and point-in-time recovery, including the restoration of complete running clusters from your backup files.
Each of these is explained in more detail below.
Deployment and upgrades
Ops Manager helps operations teams deploy MongoDB through a powerful self-service portal or by invoking the Ops Manager RESTful API from existing enterprise tools. The deployment can be anything from a single instance to a replica set or a sharded cluster, running in the public cloud or in your private data center. Ops Manager enables fast deployment on any hosting topology.
In addition to initial deployment, Ops Manager enables capacity to be dynamically scaled by adding shards and replica set members to running systems. Other maintenance tasks such upgrades, building indexes across replica sets or resizing the oplog can all be made with a few clicks and zero downtime.
Ops Manager gives developers, administrators and operations teams visibility into the MongoDB service. Featuring charts, custom dashboards, and automated alerting, Ops Manager tracks 100+ key database and systems health metrics including operations counters, memory and CPU utilization, replication status, open connections, queues and any node status.
The metrics are securely reported to Ops Manager where they are processed, aggregated, alerted and visualized in a browser, letting Administrators easily determine the health of MongoDB in real-time. Historic performance can be reviewed in order to create operational baselines and capacity planning for further scale. Integration with existing monitoring tools is also straightforward via the Ops Manager RESTful API, and with packaged integrations to leading Application Performance Management (APM) platforms such as New Relic. This integration allows MongoDB status to be consolidated and monitored alongside the rest of your application infrastructure, all from a single pane of glass.
Ops Manager provides real time & historic visibility into MongoDB with integration into operational tools
Alerts enable proactive management of MongoDB
A backup and recovery strategy is necessary to protect your mission critical data against catastrophic failure, such as a fire or flood in your data center, or human error, such as unintentional corruption due to mistakes in application code, or accidental deletion of data. With a backup and recovery strategy in place, administrators can restore business operations with minimal data loss and the organization can meet regulatory and compliance requirements.
Ops Manager and Cloud Manager provide continuous incremental backup, point-in-time recovery of replica sets, and consistent snapshots of sharded clusters. Ops Manager creates snapshots of MongoDB data and retains multiple copies based on a user-defined retention policy. You can restore to precisely the moment you need, quickly and safely. Automation-driven restores allows fully a configured cluster to be re-deployed directly from the database snapshots in a just few clicks.
How do the MongoDB operational capabilities stack up to relational databases and key-value stores? Take a look at the chart below:
|Self Healing Recovery with Automatic Failover||Yes||Often Requires Additional Clustering Software||No: Manual Failover Often Recommended|
|Separate Caching Layer Required||No||Often||Often|
|Data Center Awareness||Yes||Expensive Add-on||No|
|Continuous Backup & Point in Time Recovery||Yes||Yes||No|
|API Integration with Systems Management Frameworks||Yes||Yes||No|
Integrating MongoDB with external monitoring solutions
The Ops Manager API provides programmatic access to key monitoring data and access to Ops Manager features by external management tools.
In addition to Ops Manager, MongoDB Enterprise can report system information to SNMP traps, supporting centralized data collection and aggregation via external monitoring solutions.
To learn more about operational best practices, download our Operations Guide.
MongoDB Atlas: Database as a Service For MongoDB
Back to Table of Contents
MongoDB Atlas provides all of the features of MongoDB, without the operational heavy lifting required for any new application. MongoDB Atlas is available on-demand through a pay-as-you-go model and billed on an hourly basis, letting you focus on what you do best.
It’s easy to get started – use a simple GUI to select the instance size, region, and features you need. MongoDB Atlas provides:
Security features to protect access to your data
Built in replication for always-on availability, tolerating complete data center failure
Backups and point in time recovery to protect against data corruption
Fine-grained monitoring to let you know when to scale. Additional instances can be provisioned with the push of a button
Automated patching and one-click upgrades for new major versions of the database, enabling you to take advantage of the latest and greatest MongoDB features
A choice of cloud providers, regions, and billing options
MongoDB Atlas is versatile. It’s great for everything from a quick Proof of Concept, to test/QA environments, to complete production clusters. If you decide you want to bring operations back under your control, it is easy to move your databases onto your own infrastructure and manage them using MongoDB Ops Manager or MongoDB Cloud Manager. The user experience across MongoDB Atlas, Cloud Manager, and Ops Manager is consistent, ensuring that disruption is minimal if you decide to migrate to your own infrastructure.
MongoDB Atlas is automated, it’s easy, and it’s from the creators of MongoDB. Learn more and take it for a spin.
Back to Table of Contents
MongoDB can be 1/10th the cost to build and run, compared to a relational database. The cost advantage is driven by:
MongoDB's increased ease of use and developer flexibility, which reduces the cost of developing and operating an application
MongoDB's ability to scale on commodity server hardware and storage
MongoDB's substantially lower prices for commercial licensing, advanced features and support
Furthermore, MongoDB's technical and cost-related benefits translate to topline advantages as well, such as faster time-to-market and time-to-scale.
To learn more, download our TCO comparison of Oracle and MongoDB
Want to go deeper into MongoDB's technology? Then download our detailed Architecture Guide.