What is MongoDB?
MongoDB is a NoSQL database designed for how we build and run applications today using modern development techniques, programming models, and computing resources. As a result, it empowers businesses to be more agile and scalable, create new applications, improve customer experience, and accelerate time to market while reducing costs.
How We Build Applications
- New and Complex Data Types. Rich data structures with dynamic attributes, mixed structure, text, media, arrays and other complex types are common in today's applications.
- Modern Programming Languages. Object-oriented programming languages interact with data in structures that are dramatically different from the way data is stored in a relational database.
- Faster Development. Software engineering teams now embrace short, iterative development cycles.
How We Run Applications
- New Scalability for Big Data. Operational and analytical workloads challenge traditional capabilities on one or more dimensions of scale, availability, performance and cost effectiveness.
- Fast, Real-time Performance. Users expect consistent, interactive experiences from applications across many types of interfaces.
- New Computing Environments. The infrastructure requirements for applications can easily exceed the resources of a single computer, and cloud infrastructure now provides massive, elastic, cost-effective computing capacity on a metered cost model.
MongoDB Feature Overview
MongoDB embraces these new realities through key innovations.
- Document Data Model. Data is stored in a structure that maps to objects in modern programming languages and is easy for developers to understand.
- Rich Query Model. MongoDB is fit for a wide variety of applications. It provides rich index and query support, including secondary, geospatial and text search indexes, the Aggregation Framework and native MapReduce.
- Idiomatic Drivers. Developers interact with the database through native libraries that are integrated with their respective environments and code repositories, making MongoDB simple and natural to use.
- Horizontal Scalability. As the data volume and throughput grow, developers can take advantage of commodity hardware and cloud infrastructure to increase the capacity of the MongoDB system.
- High Availability. Multiple copies of data are maintained with native replication. Automatic failover to secondary nodes, racks and data centers makes it possible to achieve enterprise- grade uptime without custom code and complicated tuning.
- In-Memory Performance. Data is read and written to RAM while also persisted to disk for durability, providing fast performance and eliminating the need for a separate caching layer.
- Flexibility. From the document data model, to multi-datacenter deployments, to tunable consistency, to operation-level availability options, MongoDB provides tremendous flexibility to the development and operations teams, and for these reasons it is well suited to a wide variety of applications across many industries.
MongoDB Data Model
DATA AS DOCUMENTS
MongoDB stores data as documents in a binary representation called BSON (Binary JSON). Documents that tend to share a similar structure are organized as collections. It may be helpful to think of collections as being analogous to a table in a relational database, documents as similar to rows, and fields as similar to columns.
For example, consider the data model for a blogging application. In a relational database the data model would comprise multiple tables. To simplify the example, assume there are tables for Categories, Tags, Users, Comments and Articles. In MongoDB the data could be modeled as two collections, one for users, and the other for articles. In each blog document there might be multiple comments, multiple tags, and multiple categories, each expressed as an embedded array.
MongoDB documents tend to have all data for a given record in a single document, whereas in a relational database information for a given record is usually spread across many tables.
MongoDB documents can vary in structure. For example, all documents that describe users might contain the user id and the last date they logged into the system, but only some of these documents might contain the user’s identity for one or more third-party applications. Fields can vary from document to document; there is no need to declare the structure of documents to the system – documents are self-describing. If a new field needs to be added to a document then the field can be created without affecting all other documents in the system, without updating a central system catalog, and without taking the system offline.
MongoDB Query Model
MongoDB supports many types of queries. A query may return a document or a subset of specific fields within the document:
- Key-value queries return results based on any field in the document, often the primary key.
- Range queries return results based on values defined as inequalities (e.g, greater than, less than or equal to, between).
- Geospatial queries return results based on proximity criteria, intersection and inclusion as specified by a point, line, circle or polygon.
- Text Search queries return results in relevance order based on text arguments using Boolean operators (e.g., AND, OR, NOT).
- Aggregation Framework queries return aggregations of values returned by the query (e.g., count, min, max, average, similar to a SQL GROUP BY statement).
Like most database management systems, indexes are a crucial mechanism for optimizing system performance in MongoDB. And while indexes will improve the performance of some operations by orders of magnitude, they have associated costs in the form of slower writes, disk usage, and memory usage. MongoDB includes support for many types of indexes on any field in the document.
MongoDB Data Management
MongoDB provides horizontal scale-out for databases using a technique called sharding, which is transparent to applications. Sharding distributes data across multiple physical partitions called shards. Sharding allows MongoDB deployments to address the hardware limitations of a single server, such as bottlenecks in RAM or disk I/O, without adding complexity to the application.
Sharding is transparent to applications; whether there is one or one hundred shards, the application code for querying MongoDB is the same. Applications issue queries to a query router that dispatches the query to the appropriate shards.
MongoDB Consistency & Durability
MongoDB is ACID compliant at the document level. One or more fields may be written in a single operation, including updates to multiple sub-documents and elements of an array. The ACID guarantees provided by MongoDB ensures complete isolation as a document is updated; any errors cause the operation to roll back and clients receive a consistent view of the document.
Developers can use MongoDB’s Write Concerns to configure operations to commit to the application only after they have been flushed to the journal file on disk. This is the same model used by many traditional relational databases to provide durability guarantees. As a distributed system, MongoDB presents additional flexibility in enabling users to achieve their desired durability goals by controlling how write operations are persisted across replicas. You can learn more in the Write Availability section below.
MongoDB maintains multiple copies of data called replica sets using native replication. A replica set is a fully self-healing shard that helps prevent database downtime. Replica failover is fully automated, eliminating the need for administrators to intervene manually.
The number of replicas in a MongoDB replica set is configurable, and a larger number of replicas provides increased data durability and protection against database downtime (e.g., in case of multiple machine failures, rack failures, data center failures, or network partitions). Optionally, operations can be configured to write to multiple replicas before returning to the application, thereby providing functionality that is similar to synchronous replication.
Replica sets also provide operational flexibility by providing a way to upgrade hardware and software without requiring the database to go offline.
IN-MEMORY PERFORMANCE WITH ON-DISK CAPACITY
MongoDB makes extensive use of RAM to speed up database operations. Reading data from memory is measured in nanoseconds, whereas reading data from spinning disk is measured in milliseconds; reading from memory is approximately 100,000 times faster than reading data from disk. In MongoDB, all data is read and manipulated through memory-mapped files. Data that is not accessed is not loaded into RAM. While it is not required that all data fit in RAM, it should be the goal of the deployment team that indexes and all data that is frequently accessed should fit in RAM.
For example, it may be the case that a fraction of the entire database is most frequently accessed by the application, such as data related to recent events or popular products. If the volume of data that is frequently accessed exceeds the capacity of a single machine, MongoDB can scale horizontally across multiple servers using automatic sharding. Because MongoDB provides in-memory performance, for most applications there is no need for a separate caching layer.