Migrating from PostgreSQL to MongoDB

Today's web applications rely on massive volumes of structured and unstructured data. That data is complex and constantly changing. To better work with this data, and to gain the scalability and flexibility that modern applications require, you may choose to migrate your data from an older, monolithic relational database such as PostgreSQL, to modern, general-purpose database such as MongoDB.

While the idea of migrating from PostgreSQL to MongoDB may seem daunting, the process can be relatively straightforward, especially if you follow the steps we cover in this article. Once the migration is complete, you'll be able to enjoy MongoDB's many advantages, including its native drivers, flexible schema and query language, built-in autoscaling, and much more.

Table of Contents

MongoDB vs. PostgreSQL

MongoDB is a scalable and flexible NoSQL document database. Rather than storing data in tables, MongoDB stores its data in collections of BSON (Binary JSON) documents. MongoDB can manage structured and unstructured data. That allows you to build your application without needing to first define the schema.

PostgreSQL is a relational database management system (RDBMS). In an RDBMS, data is stored in tables, and the schema of the database must be defined on creation. While it is possible to alter tables after their creation as an application's needs change, this process can be complicated and error-prone.

For more detailed information, check out this article comparing MongoDB and PostgreSQL.

When to use MongoDB versus PostgreSQL

If your data model and schema evolve regularly—such as in an agile environment—MongoDB, with its flexible schema, is an excellent choice. With MongoDB, you can change the structure of documents on the fly without searching through application code to update queries and table references. In addition, MongoDB Atlas (the cloud offering of MongoDB as a DBaaS) lets you quickly start and scale your MongoDB clusters. MongoDB Atlas offers a free tier, which is a great way to experiment and learn.

On the other hand, if you have a stable, relational schema that does not change over time, an RDMBS like PostgreSQL might be sufficient for your application needs. Your application might already be tightly coupled to SQL-based clients and frameworks, such that it might make sense to hold off on a migration, and consider redesigning the application as a long-term solution.

While data models and schema flexibility are a consideration, scalability is also important. You'll find that MongoDB is the best choice for fast querying and seamless scaling. MongoDB scales both vertically and horizontally with relative ease.

Improving performance by increasing the power (CPU, memory, etc) of a database server—called vertical scaling—is possible on both MongoDB and PostgreSQL. However, vertical scaling has physical limitations and can quickly become expensive.

Horizontal scaling—adding new nodes to a database cluster to share the workload—is often a better choice. However, horizontal scaling is very difficult with relational databases, due to the difficulty in spreading out related data across nodes. With MongoDB, horizontal scaling is easier as collections are self-contained and not coupled relationally. Scaling can even be reconfigured online, without downtime, thanks to online resharding.

Preparing to switch from PostgreSQL to MongoDB

While switching from PostgreSQL to MongoDB is not difficult, the process often involves more than just extracting and migrating data. You'll also need to examine the details of the applications that access your database. For example, if you're using an ORM that does not support both relational and document databases, you’ll need to find a new library that can connect to MongoDB.

If you're just diving into document databases for the first time, MongoDB provides excellent documentation on SQL to MongoDB Mappings.

Once you’ve considered any changes needed to your application, the next step is to migrate the data. The migration for some of your tables might be simple. However, you might want to restructure your data to fit better within a MongoDB schema design. In that case, you should become familiar with best practices for MongoDB schema design, including anti-patterns. To keep our following example simple, we'll only consider the migration of data for a one-to-one mapping.

How to move data from PostgreSQL to MongoDB

The process for transferring data from PostgreSQL to MongoDB is clear-cut. Ultimately, the ease of your task depends on the complexity of the PostgreSQL database and the structure of document collections needed in the new MongoDB database.

To migrate data, you’ll extract it from PostgreSQL and then import it to MongoDB using the mongoimport tool. Let’s look at the two different ways to extract the data: returning queries as tab-separated values (TSV) or as JSON.

Using TSV to transfer data

The TSV format is the quick and easy option. However, it only works if the original PostgreSQL schema is relatively simple and you don't need to embed documents in other documents with a one-to-many relationship.

For example, if you need to create a collection of documents that represent customers and plan on embedding each customer's order in these documents, TSV won’t work because it doesn’t support hierarchical data structures. Each row in the TSV will become a document. You can still map values in each row to fields deeper in your documents; you just can’t embed documents.

You could create an address field and create nested state and city fields as in the example below. However, you could not store multiple address entities. Let's look at an example query to see how this works.

Consider the PostgreSQL created table “users.”

postgres=# select * from users;
 userid | first_name | last_name | logins | upgraded | city | state 
--------+------------+-----------+--------+----------+------+-------
      1 | Bob        | Smith     |     10 | t        | NYC  | NY

COPY (SELECT
        userid AS "userId.int32()",
        logins AS "loginCount.int32()",
      first_name AS "firstName.auto()",
      last_name AS "lastName.auto()",
        CASE WHEN upgraded=TRUE THEN 'TRUE' ELSE 'FALSE' END AS "upgradedAccount.boolean()",
        city AS "address.city.auto()",
        state AS "address.state.auto()"
    FROM
        users
) TO '/tmp/users.tsv' WITH (FORMAT CSV, HEADER TRUE, DELIMITER E'\t');

Notice that we renamed each column we are exporting with the AS command. The format of the alias being created is mongoFieldName.type(). For example, we have userId.int32(). When we execute the import, mongoimport will parse this header and create the fields with the correct types. On most of the columns, we use the auto() type and let mongoimport determine the type based on context.

For the upgraded column, which is a boolean, PostgreSQL will return a t for true and f for false. This default value won't be recognized by MongoDB, so we use a CASE statement to set values that will work.

You can also do some limited data nesting using the TSV migration process. The city and state fields are an example.

The final line in the query formats the export as CSV with a header using a tab delimiter, which is what makes the file TSV format. We use this command to import the file into MongoDB Atlas:

mongoimport --uri mongodb+srv://<mongodb_user>:<mongodb_password>@<altas-cluster>.mongodb.net/<DATABASE> 
 --collection users --type tsv --file users.tsv --headerline --columnsHaveTypes

This will result in the following document in the “users” collection:

{
    userId: 1,
    first_name: "Bob",
    last_name: "Smith",
      LoginCount : 10,
      upgraded : true,
    "Address" : {
            "city" : "NYC",
            "state" : "NY" }
}

Using JSON to transfer data

Using JSON for data migration is preferable if your PostgreSQL schema is complex and you want to nest arrays of related records inside of a MongoDB document.

To return the results of a PostgreSQL query as JSON, you will need three functions:

  1. row_to_json: Returns a row as a JSON object with column names as keys and the values from the row
  2. array_to_json: Returns an array of data as a JSON array
  3. array_agg: Accepts a set of values and returns an array where each value in the set becomes an element in the array

Let’s look at an orders table which in our relational schema keeps a record for every product ordered by the user:

 id | userid | product | quantity | price 
----+--------+---------+----------+-------
  1 |      1 | shoes   |        4 | 50.75
  2 |      1 | razer   |       20 |  1.75

Here is an example query using all three functions:

COPY (SELECT row_to_json(results)
FROM (
  SELECT userid,first_name, last_name,
    (
      SELECT array_to_json(array_agg(o))
      FROM (
        SELECT id, product, quantity, price
        FROM orders
        WHERE products.userid = users.userid
      ) o
    ) AS orders
  FROM users
) results) TO '/tmp/orders.json' WITH (FORMAT text, HEADER FALSE);

The query above will create a file orders.json with JSON documents for each user from the users table:

{
    id: 1,
    first_name: "Bob",
    last_name: "Smith",
    "orders" : [
        {
            "id" : 1,
            "product" : "shoes",
            "quantity" : 4,
            "price" : 50.75
        },
        {
            "id" : 2,
            "product" : "razer",
            "quantity" : 20,
            "price" : 1.75
        }
    ]
}

Once you have written the query and saved it, you can use mongoimport to import the file:

mongoimport --uri mongodb+srv://<mongodb_user>:<mongodb_password>@<atlas-cluster>.mognodb.net/<DATABASE> 
--collection orders --jsonArray orders.json

Conclusion

A modern, document-based database such as MongoDB can be a great choice over an RDBMS like PostgreSQL. Migrating from PostgreSQL to MongoDB can be a simple process, as we saw in the steps covered by this article:

  1. Prepare your application for connecting to MongoDB., MongoDB has support for all of the major programming languages as well as many popular frameworks.
  2. Consider the schema changes that would be best for your data, while keeping in mind MongoDB schema best practices and avoiding anti-patterns.
  3. Export the data from your PostgreSQL databases by piping the result of an SQL query into a COPY command, outputting the result either as JSON or TSV.
  4. Restructure the data to fit your MongoDB schema by using mongoimport (or as an alternative: use bulkWrite operations to load the data).

Switching to MongoDB gives you benefits such as a more flexible schema and easier scalability. And while extracting and migrating a PostgreSQL database to MongoDB does take some thought and planning, the process itself is relatively pain-free.

FAQ

Should I replace PostgreSQL with MongoDB?

MongoDB may be an excellent choice over PostgreSQL. MongoDB may be especially suitable if:

  • Your application is managing structured and unstructured data, and you expect the volume to grow.
  • You are working with modern languages that require working with objects that are flexible, requiring a database with rich language support and flexible indexing strategies.
  • Your application needs to scale quickly to adjust to changes in traffic and data volume.

What tools can I use to migrate PostgreSQL data to MongoDB?

Assuming your application is ready to connect to MongoDB and, if necessary, you have restructured your data to fit a MongoDB schema design, these are the steps you will take and the tools you will need:

  • Within the psql client, you will perform SELECT queries of your data which are then piped into a COPY command for exporting the query results, either to a tab-separated file (TSV) or as JSON.
  • If you export to JSON, your queries may need to use some PostgreSQL JSON functions and aggregation functions.
  • After your data has been exported, you would use the mongoimport tool to import your data into MongoDB.
  • As an alternative to mongoimport, you could use MongoDB's bulkWrite operation to load the data.

What are MongoDB’s advantages over PostgreSQL?

The main advantages are in the areas of speed and scalability. Regarding speed:

  • Querying data and their associated data objects is often faster in MongoDB than in SQL, which uses expensive JOIN statements.
  • MongoDB’s flexible schema enables changes to an application's data model or schema to be deployed quickly and flexibly, without the need for migrations or to update query statements in legacy code.

Regarding scalability:

  • MongoDB's distributed, scale-out architecture allows your database to grow as your application grows. This is in contrast to PostgreSQL, which is not natively distributed.
  • MongoDB Atlas supports sharded clusters with multi-node replica sets right out of the box. Shard keys can even be changed in real time without downtime, thanks to online resharding.