Comparing PostgreSQL and MongoDB

Overview

For decades, the relational database (RDBMS) served as the foundation for most applications. One of the most broadly adopted relational databases is PostgreSQL (also known as Postgres), which was developed by UC Berkeley computer scientist Michael Stonebraker in the 1980’s, building on, and simplifying, many of the concepts pioneered in the earlier Ingres project (POSTinGRES). Today it is a widely-used open source project with many contributors.

However, over the past decade, many of the assumptions that drove the development of earlier relational databases have changed:

  • Demands for higher developer productivity and faster time to market, with traditional rigid relational data models and waterfall development of monolithic applications giving way to agile methodologies, microservices, and DevOps, compressing release cycles from months and years to days and weeks.

  • The need to manage massive increases in new, rapidly changing data types – structured, semi-structured, and polymorphic data generated by new classes of web, mobile, social, and IoT applications.

  • The wholesale shift to distributed systems and cloud computing, enabling developers to exploit on-demand, highly scalable compute and storage infrastructure, with the ability to serve audiences any place they work and play around the globe, while meeting a whole new set of regulatory demands for data sovereignty.

As a result, non-relational or “NoSQL” databases, like MongoDB, have emerged in order to address the requirements of new applications, and modernize existing workloads. Support for multi-document ACID transactions, scheduled for MongoDB 4.0 in Summer 2018*, will make it even easier for developers to address use-cases that are now, or will in future, struggle with PostgreSQL. You can try transactions out by signing up for the beta program.

This page provides an overview of PostgreSQL and MongoDB, and the appropriate use cases for each.

What is PostgreSQL?

PostgreSQL is an open-source RDBMS. It has been under development for over 30 years and is maintained by the PostgreSQL Global Development Group, which consists of corporations and open source contributors. A number of companies, including EnterpriseDB offer support, services and additional commercial software for the database.

PostgreSQL stores data in tables and uses Structured Query Language (SQL) for database access, in addition to PL/pgSQL, which resembles Oracle's PL/SQL procedural language. In PostgreSQL, you pre-define your database schema based on your requirements and set up rules to govern the relationships between fields in your tables. Related information may be stored in separate tables, but associated through the use of joins. Any changes in schema necessitates a migration procedure that can take the database offline or reduce application performance.

What is MongoDB?

MongoDB is an open-source, non-relational database developed by MongoDB, Inc. MongoDB stores data as documents in a binary representation called BSON (Binary JSON). Related information is stored together for fast query access through the MongoDB query language. Fields can vary from document to document; there is no need to declare the structure of documents to the system – documents are self-describing. If a new field needs to be added to a document then the field can be created without affecting all other documents in the collection, without updating a central system catalog, and without taking the system offline. Optionally, schema validation can be used to enforce data governance controls over each collection.

MongoDB’s document data model maps naturally to objects in application code, making it simple for developers to learn and use. Documents give you the ability to represent hierarchical relationships to store arrays and other more complex structures easily. Native, idiomatic drivers are provided for 10+ languages – and the community has built dozens more – enabling ad-hoc queries, real-time aggregation and rich indexing to provide powerful programmatic ways to access and analyze data of any structure.

Because documents can bring together related data that would otherwise be modeled across separate parent-child tables in a relational schema, MongoDB’s atomic single-document operations already provide transaction semantics that meet the data integrity needs of the majority of applications. One or more fields may be written in a single operation, including updates to multiple sub-documents and elements of an array. The guarantees provided by MongoDB ensure complete isolation as a document is updated; any errors cause the operation to roll back so that clients receive a consistent view of the document.

MongoDB 4.0, scheduled for Summer 2018*, will add support for multi-document transactions, making it the only database to combine the ACID guarantees of traditional relational databases, the speed, flexibility, and power of the document model, with the intelligent distributed systems design to scale-out and place data where you need it. Through snapshot isolation, transactions will provide a globally consistent view of data, and enforce all-or-nothing execution to maintain data integrity. Transactions in MongoDB will feel just like transactions developers are familiar with in PostgreSQL. They will be multi-statement, with similar syntax (e.g. start_transaction and commit_transaction), and therefore easy for anyone with prior transaction experience to add to any application. You can try transactions out by signing up for the beta program.

Unlike PostgreSQL and other relational databases, MongoDB is built on a distributed systems architecture, rather than a monolithic, single node design. As a result, MongoDB offers out-of-the-box scale-out and data localization with automatic sharding, and replica sets to maintain always-on availability.

Learn more from the What is MongoDB page.

Terminology and Concepts

Many concepts in PostgreSQL have close analogs in MongoDB. The table below outlines the common concepts across PostgreSQL and MongoDB.

PostgreSQLMongoDB
ACID TransactionsACID Transactions*
TableCollection
RowDocument
ColumnField
Secondary IndexSecondary Index
JOINsEmbedded documents, $lookup & $graphLookup
GROUP_BYAggregation Pipeline
ACID transactions are scheduled for MongoDB 4.0*

Query Language

Both PostgreSQL and MongoDB have a rich query language. Below are a few examples of SQL statements and how they map to MongoDB. A more comprehensive list of statements can be found in the MongoDB documentation.

SQLMongoDB
CREATE TABLE users (
user_id VARCHAR(20) NOT NULL,
age INTEGER NOT NULL,
status VARCHAR(10));
Not Required
INSERT INTO users(user_id,
	age,
	status)
VALUES ('bcd001',
	45,
	"A");
db.users.insert(
	{ user_id: "bcd001", age: 45, status: "A" }
)
SELECT *
FROM users;
db.users.find()
UPDATE users
SET status = 'C'
WHERE age > 25;
db.users.update(
	{ age: { $gt: 25 } },
	{ $set: { status: "C" } },
	{ multi: true }
)

Contrasting MongoDB and Postgres for Building Modern Applications

Postgres was one of the first relational databases to introduce support for a JSON data type. Its implementation of JSON is mature, and in many areas, is more advanced than other relational databases. However, compared to MongoDB:

  • Postgres does not offer any native mechanisms to scale the database beyond a single server, or to provide always-on database availability. Both of these limitations inhibit the ability to deploy modern, highly-scalable and globally distributed applications, or to take advantage of cloud-native architectures.

  • It is not as natural to work with JSON data in Postgres – which can reduce developer productivity and therefore app time to market. The non-standard extensions to SQL to query and manipulate JSON are not supported by most tools

Monolithic vs. Distributed Architecture

Applications that were once designed to serve a finite audience are now delivered as web and cloud services that must be always-on, accessible from many different devices on any channel, and scaled globally to millions of users distributed across the globe.

To address these requirements MongoDB is built around a distributed architecture that can be run within and across geographically distributed data centers and cloud regions, providing levels of availability and scalability unmatched by Postgres. As databases grow in terms of data volume and throughput, MongoDB scales elastically with no downtime, and without application changes. And as an application’s performance and availability goals evolve, MongoDB allows users to adapt flexibly, across data centers, with tunable consistency. These features exist because MongoDB was designed to take advantage of modern hardware and software capabilities.

Horizontal Scaling

Attempting to accommodate increasing data volumes and user populations with a database running on a single server means users can rapidly hit a scalability wall, requiring significant application redesign and custom engineering work.

Postgres has no native mechanisms to partition (shard) the database across a cluster of nodes – whether storing relational or JSON data types. To scale-out the database, users are confronted with several options:

  1. Manually shard the database at the application level, which adds significant development complexity and inhibits the ability to elastically expand and contract the cluster as workloads dictate

  2. Integrate a separate sharding framework for Postgres. Like the HA frameworks discussed above, these sharding layers are developed independently from the database, so the user has the added complexity of integrating and managing multiple and distinct pieces of technology in order to provide a complete working solution.

  3. Most Postgres sharding frameworks rely on older, and heavily forked versions of the Postgres community code base.

  4. Whatever approach is taken, users will also typically lose key relational capabilities: ACID transactions, referential integrity, and JOINs as they commit operations across sharded clusters

MongoDB provides horizontal scale-out for databases on low cost, commodity hardware using a technique called sharding, which is transparent to applications. Sharding allows MongoDB deployments to automatically scale beyond hardware limitations of a single server, without adding complexity to the application. MongoDB automatically balances the data in the cluster as the data grows or the size of the cluster increases or decreases. MongoDB exposes multiple sharding policies that enable developers and administrators to distribute data across a cluster according to query patterns or data sovereignty. As a result, MongoDB can deliver much higher scalability across a diverse set of workloads.

MongoDB zone sharding allows precise control over how data is partitioned, and where it is physically stored. This accommodates a range of deployment scenarios – for example, sharding data by geographic region, by hardware configuration, or by application feature. Administrators can continuously refine data placement rules by modifying shard key ranges, and MongoDB will automatically migrate the data to its new zone. Note that MongoDB Atlas supports cross-region clusters, but does not currently offer zones sharding.

The most popular use cases for MongoDB zones include the following:

Geographically Distributed Clusters
MongoDB gives users the ability to create zones in multiple geographic regions. Each Zone is part of the same, single cluster and can be queried globally, but data resides in the correct location based on sovereignty and local access requirements. By associating data to shards based on user location, administrators are able to maintain low latency access.

Localized Writes in a Distributed Cluster
Zones provide a solution for continuous availability of insert-only workloads such as the ingestion of sensor data in IoT applications. Zones can be used to create configurations specifically for localized writes in a distributed cluster, ensuring there is always a node available to accept inserts, even during a data center failure. Learn more by reviewing our tutorial on configuring localized writes with MongoDB zones.

Tiered Storage
Different subsets of the database may have different response time requirements, usually based on access frequency and age of the data. For example, IoT applications or social media services handling time-series data will demand users experience the lowest latency when accessing the latest data. However aged data sets that are read less frequently from an “active archive” have relaxed latency SLAs.

Asymmetric hardware configurations within a sharded cluster can be created and managed with zones. By implementing a tiered storage pattern:

  • The most recent data can be located on the highest performance hardware with fast CPUs and SSDs

  • Aged data can be moved onto slower, but less expensive hardware based on conventional, high capacity spinning disks.

Always-On Availability

Postgres supports replication, but lacks integrated mechanisms for automatic failover and recovery between database replicas. As noted by the Postgres documentation:

  • “PostgreSQL does not provide the system software required to identify a failure on the primary and notify the standby database server. Many such tools exist and are well integrated with the operating system facilities required for successful failover, such as IP address migration.”

As a result, users need to layer 3rd party clustering frameworks to provide the necessary availability and uptime demanded by modern applications. What are the downsides of this approach?:

  • These frameworks are developed independently from the database, so the user has the added complexity of integration and the risk of managing multiple separate pieces of technology and processes, backed by different vendors (or in some cases, individual developers).

  • It means additional complexity in coordinating implementation, testing, and maintenance across multiple teams – developers, DBAs, network administrators and system administrators – each with their own specific areas of responsibility.

  • Failover events can take multiple minutes to recover, during which time, Postgres will be unavailable to serve write operations.

  • There are 3rd party multi-master replication solutions for Postgres, however as noted in the official Postgres documentation “write performance is often worse than that of a single server. Synchronous multimaster replication is best for mostly read workloads”

Contrast the above with the replication and always-on the capabilities of MongoDB, which maintains up to 50 replicas of the data to maintain database availability and scale read capacity. Replicas can be distributed across regions to provide resilience to entire data center failures, and to provide data locality to support global user bases.

Replica failures are self-healed by the database, requiring no operator intervention, and are typically completed in less than 2 seconds. Retryable writes allow the MongoDB drivers and the database to automatically resubmit writes that could not be serviced during the election of a new primary, significantly reducing the amount of exception handling code the developer needs to write. As a result of MongoDB’s replication and failover design, applications remain unaffected by underlying system or database outages, and planned maintenance events.

Developer Productivity

Working with data as flexible JSON documents, rather than as rigid rows and columns, is proven to help developers move faster. It’s not hard to find teams who have been able to accelerate development cycles by 4-or-5x after moving to MongoDB from relational databases. Why is this?:

  • Documents are natural. Documents represent data in the same way that applications do. Unlike the tabular rows and columns of a relational database, data can be structured with arrays and subdocuments – in the same way applications represent data, as lists and members / instance variables respectively. This makes it much simpler and faster for developers to model how data in the application will map to data stored in the database.

  • Documents are flexible. Each document can store data with different attributes from other documents. As an example, consider a product catalog where a document storing details for an item of mens’ clothing will store different attributes from a document storing details of a tablet. This is a property commonly called “polymorphism”. With JSON documents, we can add new attributes when we need to, without having to alter a centralized database schema. At worst, this causes downtime, at best, significant performance overhead in a relational database. The flexibility documents bring allows the developer to much more easily handle the semi and unstructured data generated by modern mobile, web, and IoT applications.

  • Documents make applications fast. With data for an entity stored in a single document, rather than spread across multiple relational tables, the database only needs to read and write to a single place. Having all the data for an object in one place also makes it easier for developers to understand and optimize query performance.

It’s for these reasons that Postgres, and other relational databases, have added support for JSON. However, simply adding a JSON data type does not bring the developer productivity benefits of a document database to Postgres. Why? Because Postgres’ approach can detract from developer productivity, rather than improve it. Consider the following:

  • Proprietary Extensions: Querying and manipulating the contents of a JSON document requires the use of separate Postgres-specific SQL functions to access values, which will not be familiar to most developers. In addition, they are not supported or recognized by 3rd party SQL tools, such as BI platforms, data warehouse connectors, ETL and ESB pipelines, and more.

    • How is MongoDB different: The MongoDB API is widely understood, and adopted by industry standard tools and connectors. Several mega-vendor database companies have even adopted the MongoDB API themselves.

  • Complex Data Handling: When using JSON data, Postgres drivers do not have the capability to properly and precisely convert JSON into a useful native data type used by the application. This includes different types of numeric values (e.g. floating points, 64-bit integers, decimals) timestamps, and dates, or a Map or List in Java or a Dictionary or List in Python. Instead developers have to manually convert text-based JSON in their application, losing the ability to have fields that can take on multiple data types in different documents (polymorphism) and making the computation, sorting and comparison of values difficult and error-prone.

    • How is MongoDB different: Binary Encoded JSON (BSON) used by MongoDB and its drivers supports advanced data types not supported by regular text-based JSON.

  • Legacy Relational Overhead: Even with JSON support, Postgres users are still tied to multiple layers of SQL/relational functionality to interact with JSON data – low level JDBC/ODBC drivers and Object Relational Mappers (ORMs). To experienced Postgres developers, these layers maybe familiar, but to many other developers who want to interact with documents and data through APIs that are natural and idiomatic to their programming language, these layers impose high learning overhead. ORMs are also generally recognized as hard to optimize for performance and query efficiency – even for experienced relational developers.

    • How is MongoDB different: MongoDB drivers are implemented in the methods and functions that are idiomatic and natural to the programming languages used by developers.

  • No Data Governance: Postgres offers no native mechanism to validate the schema of JSON inserted or updated in the database, so developers need to add either application or database-side functionality to apply governance controls against the data.

    • How is MongoDB different: Schema validation, based on the JSON Schema IETF standard, allows developers and DBAs to define and enforce a prescribed schema structure for each MongoDB collection.

  • Schema Rigidity: Postgres users still need to define a schema for their regular relational data. If the schema is then modified to accommodate new application requirements, the table is locked for some operations until existing data is copied into the new schema, requiring applications to be quiesced during schema migration.

    • How is MongoDB different: Developers and DBAs can combine the flexibility of a fully dynamic schema with the governance controls needed for some applications across all data stored in the database, not just subsets of it.

  • No Statistics for Query Optimization: Unlike regular data types, Postgres does not maintain statistics on JSONB data types, preventing the query planner from optimizing queries against JSON data.

To compare the effects of mixing the pseudo-natural language syntax of SQL with JSON structure and code, consider the following example where the developer writes a query to find all documents in which the field "tags" contains array element "qui":

Postgres: Mixing SQL with JSON

SELECT jdoc->'guid', jdoc->'name' FROM api
WHERE jdoc @> '{"tags": ["qui"]}';

MongoDB: Idiomatic JSON

db.api.find({tags:"qui"}, {guid:1, name:1})

Read this blog for a deeper dive, hands-on analysis contrasting developer productivity between Postgres and MongoDB.

How Does MongoDB Compare to AWS Aurora Postgres?

AWS positions its Aurora service as “a relational database engine that combines the speed and reliability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases.” In addition to MySQL, the service offers a Postgres engine, claiming 3x higher performance than regular Postgres on RDS. Despite it’s claims, there are several issues users need to consider:

Lock-in: Aurora Postgres is only available on AWS.

  • The enhancements made by AWS engineers are not released back to the Postgres community, and so once users recertify their apps for Aurora, they are restricted to running on AWS only. And if the customer decides to leave Aurora in the future, they need to recertify their apps again

Not continuously available, not globally distributed. Based on the AWS FAQ, failures can take 30-seconds to resolve, and developers need to add their own exception handling code to retry writes once recovery has been completed. Further limiting availability, there is currently no way to provision Postgres Aurora replicas across AWS regions, and so a region failure will bring the entire database down.

Limited Scalability: As with Postgres outside of Aurora, there is no way to scale writes beyond a single Aurora instance.

MongoDB Atlas provides a fully managed database as a service for MongoDB. Like Aurora, storage is automatically scaled by the service as consumption grows. Compared with Aurora, running MongoDB on Atlas offers a number of benefits:

  • You get full compatibility wherever you run MongoDB – there is no platform lock-in. The same MongoDB code base runs all the way from developer laptop, to on-premise deployments, self-managed cloud instances, and to the Atlas service. You can run provision Atlas clusters on AWS, Microsoft Azure, and Google Cloud Platform.

  • MongoDB Atlas provides a free tier to enable you to evaluate and test your apps.

  • Failovers are typically completed in less than 2-seconds, and MongoDB drivers will automatically retry writes that have failed due to transient system issues, such as network failures of primary replica elections. Even the free tier includes a 3 node replica set, allowing you to test how your application reacts to failovers.

  • MongoDB Atlas offers auto-sharding to scale database capacity and write traffic beyond a single node.

  • Atlas automates the process of migrating your data from your existing, live on-premise or cloud-based MongoDB database into the Atlas service.

Users Selecting MongoDB over Postgres

As the following examples illustrate, MongoDB’s selection over Postgres is driven by developer productivity, performance, and scalability:

  • Experian Health selected MongoDB over Postgres and other relational databases to power its Universal Identification Manager, a new application the company uses to uniquely identify healthcare customers. Experian Health CTO Mike Ochs explains “It was hands [MongoDB] down because the way the data is organized lends itself very well to the way Mongo manages data….It was way simpler from a development perspective, and performance wise….it is essentially 4x more performant.”1

  • Marketing technology vendor Mintigo leverages MongoDB to power its predictive analytics. They chose MongoDB over PostgreSQL for the flexibility of the document-based model and MongoDB’s ability to scale. “We initially prototyped on an alternative database technology called PostgreSQL. It’s a great relational database but it soon became clear that it would never handle the schema flexibility or scale that we needed,” explains Tal Segalov, CTO and Co-Founder of Mintigo.2

  • The Ansible team at Red Hat selected MongoDB for a log analysis application. “MongoDB performs orders of magnitude better than Postgres on the same, even double, the hardware and has other desirable features (i.e. arbitrary JSON structure querying, horizontal scaling),” says Chris Meyers of Red Hat.3

  • eHarmony was able to accelerate compatibility matching between potential partners 95% faster after migrating from relational databases, including Postgres.4

What are common use cases for MongoDB?

MongoDB is a multi-purpose data platform that is used for a variety of use cases. The most common use cases for MongoDB include Single View, Internet of Things, Mobile, Real-Time Analytics, Personalization, Catalog, and Content Management.

Because documents can bring related data together that would otherwise be modeled across separate parent-child tables in a relational schema, MongoDB’s atomic single-document operations already provide transaction semantics that meet the data integrity needs of the majority of applications. With the addition of multi-document transactions, it will be even easier for MongoDB to address a complete range of use-cases.

When would PostgreSQL be a better fit?

While many organizations have successfully migrated from an RDBMS to MongoDB, you cannot drop-in MongoDB as a replacement for legacy applications built around the relational data model and SQL. However, organizations are investing time to replatform mission-critical, revenue generating applications from relational databases to MongoDB. For example, Cisco migrated its ecommerce platform from a legacy relational database to MongoDB, supporting improved customer experience by reducing latency by 8x, zero downtime upgrades, faster application development, and creating a cloud-ready platform5.

Want to Learn More? Get the RDBMS to MongoDB Migration Guide

Relational databases are being pushed beyond their original design limits because of the way we build and run applications today, handling exponential growth in data sources and user loads. To address these challenges, many companies – like those mentioned above – have migrated successfully from relational databases to MongoDB. In this whitepaper, you'll learn:

  • Step by step how to migrate from a relational database to MongoDB.

  • The relevant technical considerations, such as differences between the relational and document data models and the implications for schema design.

  • Indexing, queries, application integration and data migration.

*Safe Harbour Statement

The development, release, and timing of any features or functionality described for our products remains at our sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality.