For those of you who want the news right up front, here it is in 135 words.
MongoDB is the leading document database. It is built on a distributed, scale-out architecture and has become a comprehensive cloud-based platform for managing and delivering data to applications. MongoDB handles transactional, operational, and analytical workloads at scale. If your concerns are time to market, developer productivity, supporting DevOps and agile methodologies, and building stuff that scales without operational gymnastics, MongoDB is the way to go.
PostgreSQL is a rock solid, open source, enterprise-grade SQL database that has been expanding its capabilities for 30 years. Everything you would ever want from a relational database is present in PostgreSQL, which relies on a scale-up architecture. If your concerns are compatibility, serving up thousands of queries from hundreds of tables, taking advantage of existing SQL skills, and pushing SQL to the limit, PostgreSQL will do an awesome job.
As an astute reader should already be able to tell, the real question is not MongoDB vs Postgres, but the best document database versus the best relational database. Both databases are awesome.
If you are looking for a distributed database for modern transactional and analytical applications that are working with rapidly changing, multi-structured data, then MongoDB is the way to go.
If a SQL database fits your needs, then Postgres is a great choice.
The right answer for your needs is based of course on what you are trying to do. Our goal in this article is to help to explain the personality and characteristics of each of these databases so you can better understand whether it meets your needs.
But often at the beginning of a development project, the project leaders often have a good grasp of the use case, but don’t really have clarity about the specific application features their business and users will need. They have to make a bet about the best fit. The rest of this article aims to provide information that helps make a safe bet.
But again, for those who want the story right away, here is a summary of our general guidance:
If you are at the beginning of a development project and are seeking to figure out your needs and data model by using an agile development process, MongoDB will shine because developers can reshape the data on their own, when they need to. MongoDB enables you to manage data of any structure, not just tabular structures defined in advance.
If you are supporting an application you know will have to scale in terms of volume of traffic or size of data (or both) and that needs to be distributed across regions for data locality or data sovereignty, MongoDB’s scale-out architecture will meet those needs automatically.
If you want a multi-cloud database that works the same way in every public cloud, can store customer data in specific geographic regions, and support the latest serverless and mobile development paradigms, MongoDB Atlas is the right choice.
If you are a SQL shop and introducing a new paradigm will cost more than any other benefits mentioned will offset, PostgreSQL is a choice that will likely meet all your needs.
If you want a relational database that will run complex SQL queries and work with lots of existing applications based on a tabular, relational data model, PostgreSQL will do the job.
If you are a creative SQL developer and want to push SQL to the limits by using advanced techniques for indexing, storing and searching numerous structured data types, creating user-defined functions in a variety of languages, and tuning the database to the nth degree, you likely will be able to go further with PostgreSQL than any other RDBMS.
So, now that the impatient have been satisfied, the patient can take a deeper dive into MongoDB, then PostgreSQL, and then a comparison.
MongoDB’s document data model maps naturally to objects in application code, making it simple for developers to learn and use. Documents give you the ability to represent hierarchical relationships to store arrays and other more complex structures easily. JSON documents can store data in fields, as arrays, or even as nested sub-documents. In this way, related information can be stored together for fast query access through the rich and expressive MongoDB query language.
MongoDB stores data as documents in a binary representation called BSON (Binary JSON). Fields can vary from document to document; there is no need to declare the structure of documents to the system – documents are self-describing. If a new field needs to be added to a document, then the field can be created without affecting all other documents in the collection, without updating a central system catalog, updating an ORM, and without taking the system offline. Optionally, schema validation can be used to enforce data governance controls over each collection.
This flexibility is hugely useful when consolidating information from diverse sources or accommodating variations in documents over time, especially as new application functionality is continuously deployed.
Many of the terms and concepts used in MongoDB's document model are the same or similar to PostgreSQL's tabular model:
|ACID Transactions||ACID Transactions|
|Secondary Index||Secondary Index|
|JOINs, UNIONs||Embedded documents, $lookup & $graphLookup, $unionWith|
|Materialized Views||On-demand Materialized Views|
MongoDB allows you to store data in almost any structure, and each field – even those deeply nested in subdocuments and arrays – can be indexed and efficiently searched.
MongoDB adds elements to the document model and the query engine to handle both geospatial and time series tagging of data. This expands the type of queries and analytics that can be performed on a database.
BSON includes data types not present in JSON data (e.g., datetime, int, long, date, floating point, and decimal128, and byte array) offering type-strict handling for multiple numeric types instead of a universal "number" type.
Schema validation enables you to apply governance and data quality controls to your schema.
One of the most powerful features of relational databases that make writing applications easier is ACID transactions. The details of how ACID transactions are defined and implemented fill many computer science text books. Much of the discussion in the computer science realm is about isolation levels in database transactions). PostgreSQL defaults to the read committed isolation level, and allows users to tune that up to the serializable isolation level.
The important thing to remember is that transactions allow many changes to a database to be made in a group or rolled back in a group.
In a relational database, the data in question would be modeled across separate parent-child tables in a tabular schema. This means that updating all the records at once would require a transaction.
In a sense, document databases have an easier time implementing transactions because they cluster data in a document and writing and reading a document is an atomic operation so it doesn’t need a multi-document transaction. One or more fields may be written in a single operation, including updates to multiple subdocuments and elements of an array. MongoDB guarantees complete isolation as a document is updated. Any errors will trigger the update operation to roll back, reverting the change and ensuring that clients receive a consistent view of the document.
MongoDB also supports database transactions across many documents, so chunks of related changes can be committed or rolled back as a group. With its multi-document transactions capability, MongoDB is one of the few databases to combine the ACID guarantees of traditional relational databases with the speed, flexibility, and power of the document model.
From the programmer perspective, transactions in MongoDB feel just like transactions developers are already familiar with in PostgreSQL. Transactions in MongoDB are multi-statement, with similar syntax (e.g., starttransaction and committransaction) with snapshot isolation,and are therefore easy for anyone with prior transaction experience to add to any application.
The relational database model that PostgreSQL uses relies on storing data in tables and then using Structured Query Language (SQL) for database access.
To make this work, in PostgreSQL and all other SQL databases, the database schema must be created and data relationships established before the database is populated with data. Related information may be stored in separate tables, but associated through the use of Foreign Keys and JOINs. Most changes in schema necessitate a migration procedure that can take the database offline or reduce application performance while it is running.
The strength of SQL is its powerful and widely known query language, with a large ecosystem of tools.
The challenge of using a relational database is the need to define its structure in advance. Changing structure after loading data is often very difficult, requiring multiple teams across development, DBA, and Ops to tightly coordinate changes.
Now in the document database world of MongoDB, the structure of the data doesn’t have to be planned up front in the database and it is much easier to change. Developers can decide what’s needed in the application and change it in the database accordingly.
MongoDB does not use SQL by default. Instead, to work with documents in MongoDB and extract data, MongoDB provides its own query language (MQL) that offers most of the same power and flexibility as SQL. For example, like SQL, MQL allows you to reference data from multiple tables, transform and aggregate that data, and filter for the specific results you need. Unlike SQL, MQL works in a way that is idiomatic for each programming language.
Query performance in MongoDB can be accelerated by creating indexes on fields in documents and subdocuments. MongoDB allows any field of a document, including those deeply nested in arrays and subdocuments, to be indexed and efficiently queried.
The following chart compares the SQL and MongoDB approaches to querying data and shows a few examples of SQL statements and how they map to MongoDB:
Both PostgreSQL and MongoDB have a rich query language. Below are a few examples of SQL statements and how they map to MongoDB. A more comprehensive list of statements can be found in the MongoDB documentation.
Check out these resources for even more comparisons:
The document model also has emergent properties that make development and collaboration much easier and faster.
From an individual developer perspective, MongoDB makes data much like code. The developer can define the structure of a JSON or BSON document, do some development, see how it goes, add new fields at any time and reshape data at will, which is the beauty of the document model. This flexibility avoids the delays and bottlenecks associated with having to ask a DBA to restructure data definition language statements and then recreate and reload a relational database, or having the developer doing such work.
In a document database, a developer or team can own documents or portions of documents and evolve them as needed, without intermediation and complex dependency chains between different teams.
MongoDB was built to scale out. So use cases that require super speedy queries and massive amounts of data or both can be handled by making ever bigger clusters of small machines.
MongoDB is based on a distributed architecture that allows users to scale out across many instances, and is proven to power huge applications, whether measured by users or data sizes. The scale-out strategy relies on using a larger number of smaller and usually inexpensive machines. This strategy can expand to hundreds of machines.
In PostgreSQL, the approach to scaling depends on whether you are talking about writing or reading data. For writes, it is based on a scale-up architecture, in which a single primary machine running PostgreSQL must be made as powerful as possible in order to scale. For reads, it is possible to scale-out PostgreSQL by creating replicas, but each replica must contain a full copy of the database.
The plumbing that makes MongoDB scalable is based on the idea of intelligently partitioning (sharding) data across instances in the cluster. MongoDB does not break documents apart; documents are independent units which makes it easier to distribute them across multiple servers while preserving data locality.
In the fully-managed, global MongoDB Atlas cloud service, it’s easy to distribute data across regions. Certain documents can be tagged so they will always be physically stored in specific countries or geographic regions. Such location-awareness can:
Help comply with laws concerning where data may be legally stored
Each MongoDB shard runs as a replica set: a synchronized cluster of three or more individual servers that continuously replicate data between them, offering redundancy and protection against downtime in the face of a system failure or planned maintenance. Replicas can also be installed across datacenters, offering resiliency against regional outages. Creating and configuring such clusters is made even easier and faster in MongoDB Atlas.
MongoDB has implemented a modern suite of cybersecurity controls and integrations both for its on-premise and cloud versions. This includes powerful security paradigms like client-side field-level encryption, which allows data to be encrypted before it is sent over the network to the database.
PostgreSQL has a full range of security features including many types of encryption. PostgreSQL is available in the cloud on all major cloud providers. While it is all the same database, operational and developer tooling varies by cloud vendor, which makes migrations between different clouds more complex. MongoDB Atlas runs in the same way across all three major cloud providers, simplifying migration and multi-cloud deployment.
As any fundamental technology like a database grows, it is supported by a platform ecosystem of services, integrations, partners, and related products. At the center of the MongoDB platform ecosystem is the database, but it has many layers that provide additional value and solve problems.
MongoDB has seen massive adoption and is the most popular modern database, and based on a Stackoverflow developer survey, the database developers most want to use. Thanks to the efforts of MongoDB engineering and the community, we have built out a complete platform to serve the needs of developers.
PostgreSQL can be run as an installed, self-managed version, or as a database-as-a-service on all of the leading cloud providers. Each of those implementations work the way the cloud provider that created them wants them to work. To get support for PostgreSQL, you have to use a cloud version or go to third parties offering specialized services
MongoDB is available in the following forms:
MongoDB Atlas is a database-as-a-service offering that runs on all of the major cloud platforms (AWS, Microsoft Azure, and Google Cloud Platform).
MongoDB Enterprise is based on MongoDB Community edition with additional features that are only available through the MongoDB Enterprise Advanced subscription. Enterprise Advanced includes comprehensive support for your MongoDB deployment. It also adds enterprise-focused features such as LDAP and Kerberos support, on-disk encryption, auditing, and operational tooling. MongoDB Enterprise can be installed on Linux, Windows, or Mac OS.
In addition, MongoDB supports numerous programming languages. Native, idiomatic drivers are provided for over a dozen languages – and the community has built many more – enabling ad-hoc queries, real-time aggregation and rich indexing to provide powerful programmatic ways to access and analyze data of any structure.
MongoDB has a strong developer community that represents everyone from hobbyists to the most innovative startups to the largest enterprises and government agencies, including a multitude of systems integrators and consultants who provide a wide range of commercial services.
MongoDB Atlas has also been extended through MongoDB Realm to ease app development, through Atlas Search powered by Lucene, and with features that support data lakes built on cloud object storage.
Both PostgreSQL and MongoDB have strong communities of developers and consultants who are ready to help.
At this point in its development, MongoDB offers industry-leading scalability, resiliency, security, and performance: but where is its sweet spot?
MongoDB is adept at handling data structures generated by modern applications and APIs and is ideally positioned to support the agile, rapidly changing development cycle of today’s development practices.
The real question is what your data will be in the end. If data aligns with objects in application code, then it can be easily represented by documents. MongoDB is a good fit during development and in production, especially if you have to scale.
But if you have many incumbent applications based on relational data models and teams seasoned just in SQL, a document database like MongoDB may not be a good fit.
Document databases can do JOINs, but they are done differently from multi-page SQL statements that are sometimes required and often automatically generated by BI tools. That said, MongoDB does have an ODBC connector that allows SQL access, mostly from BI tools.
PostgreSQL, like Linux, is an example of a well-managed open source project. One of the most broadly adopted relational databases, PostgreSQL came out of the POSTGRES project at the University of California at Berkeley starting in 1986 and it has evolved with the times.
PostgreSQL calls itself an open source object-relational database system.
It's a SQL database, that has some strategies for handling indexing, increasing concurrency, and implementing optimizations and performance enhancements including advanced indexing, table partitioning, and other mechanisms.
The object part of PostgreSQL relates to the many extensions that enable it to include other data types such as JSON data objects, key/value stores, and XML.
PostgreSQL’s design principles emphasize SQL and relational tables and allow extensibility.
PostgreSQL offers many ways to improve the efficiency of the database, but at its core it uses a scale-up strategy.
Like MySQL and other open source relational databases, PostgreSQL has been proven in the cauldron of demanding use cases across many industries.
Let’s cover a few of the ways that PostgreSQL excels before looking at the main issue for our comparison: When is a tabular, relational model andSQL the best fit for an application?
PostgreSQL takes a practical, engineering minded approach to pretty much everything. For example, consider this statement about conformance to the latest SQL standard:
“PostgreSQL tries to conform with the SQL standard where such conformance does not contradict traditional features or could lead to poor architectural decisions.”
Good for them. And as they correctly point out:
“As of this writing, no relational database meets full conformance with this standard.”
In the world of SQL, there are best efforts SQL engines that handle a certain set of simple queries well, and more robust SQL engines with query optimizers that handle complex queries and always finish with a correct result. PostgreSQL is a robust SQL engine.
This robustness comes from steady progress over time. One detail that should impress SQL nerds is that it supports “all transaction isolation levels defined in the SQL standard, including serializable.” This is a level of engineering that most commercial databases of long tenure don’t bother with because it is too hard to achieve with adequate performance.
Because PostgreSQL relies on a scale-up strategy to scale writes or data volumes, it must make the most of the computing resources available. PostgreSQL does this through a variety of strategies for indexing and concurrency.
PostgreSQL offers a variety of powerful index types to best match a given query workload. Indexing strategies include B-tree, multicolumn, expressions, and partial, as well as advanced indexing techniques such as GiST, SP-Gist, KNN Gist, GIN, BRIN, covering indexes, and bloom filters.
In addition to a mature query planner and optimizer, PostgreSQL offers performance optimizations including parallelization of read queries, table partitioning, and just-in-time (JIT) compilation of expressions.
The database complies with a wide range of security standards and has numerous features to support reliability, backup, and disaster recovery, usually through 3rd party tooling.
PostgreSQL supports extensibility in numerous ways including stored functions and procedures, access from procedural languages such as PL/PGSQL, Perl, Python, and more, SQL/JSON path expressions, and foreign data wrappers, which connect to other databases or streams using a standard SQL interface.
Many extensions provide additional functionality, including PostGIS, a module for geospatial analysis.
Because PostgreSQL is widely used, you can be pretty sure that most development tools and other systems have been tested with it and are compatible.
The approach PostgreSQL has taken to connecting APIs from languages to its databases has been imitated by many other databases, making it easier to move a program from running on PostgreSQL to another SQL database and vice versa.
As we said at the outset, the question is not “MongoDB vs PostgreSQL?” but “When does it make sense to use a document database vs a relational database?” because each database is the best version of its particular database format.
The upsides of SQL include the vast ecosystem of tools, integrations, and programming languages built to use SQL databases. It is likely that you can easily find help to make your SQL database project in general and PostgreSQL project in particular work. There are also a multitude of deployment options for PostgreSQL.
Giving up on SQL means walking away from a large ecosystem of technology that already uses SQL. That’s easier to do if you are working on a new application, or plan on modernizing an existing one.
Lots of data management and BI tools rely on SQL and programatically generate complex SQL statements to get just the right collection of data from the database. PostgreSQL does very well in such contexts because it is a robust, enterprise-grade implementation that is understood by many developers.
Also, if you have a flat, tabular data model that isn’t going to change very often and doesn’t need to scale-out, relational databases and SQL can be a powerful choice.
But the perceived benefits of SQL have costs that must be considered.
The downside of PostgreSQL compared to MongoDB is that it relies on relational data models that are unfriendly to the data structures developers work with in code, and that must be defined in advance, slowing progress whenever requirements change.
MongoDB supports a rapid, iterative cycle of development so well because of the way that a document database turns data into code under the control of developers. This speed is disrupted by the nature of rigid, tabular data models used in relational databases, which usually must be reshaped by database administrators through an intermediated process, which slows the entire process of development. Such bottlenecks can put a damper on innovation.
When an application goes live, PostgreSQL users must be ready to fight a battle about scalability. PostgreSQL uses a scale-up strategy. This means that at some point, for high performance use cases, you may hit a wall or have to divert resources to finding other ways to scale via caching or denormalizing data or using other strategies.
In MongoDB such techniques are usually not required because scalability is built-in through native sharding, enabling a horizontal scale-out approach. After properly sharding a cluster, you can always add more instances and keep scaling out. MongoDB Atlas has a broad multi-cloud, globally aware platform at the ready, all fully managed for you.
PostgreSQL can support replication but more advanced features such as automatic failover must be supported by third-party products developed independently of the database. Such an approach is more complex and can work slower and less seamlessly than MongoDB’s in-built self-healing capabilities.
The nature of your data and your target use cases are also vitally important. Those with a large ecosystem of SQL skills and tools and numerous existing applications may choose to continue using a relational data model.
But MongoDB has succeeded, especially in the enterprise, because it opens the door to new levels of developer productivity, while static relational tables often introduce roadblocks. If you have data that needs to be delivered at scale, that would benefit from developer control of the schema, or that meets a need you don’t fully understand at the outset, a document database like MongoDB fits the bill.
Both MongoDB and PostgreSQL are excellent databases. We hope this discussion sheds some new light on which will better meet your needs.