Migrating Terabytes of IoT Data from Azure Cosmos DB to MongoDB Atlas

Paresh Saraf, Rajesh Vinayagam, and Krishnakumar Sathyanarayanan

In 2020, a large European energy company began an ambitious plan to replace its traditional metering devices — all 7.6 million of them — with smart meters. That would allow the energy company to monitor gas use remotely and allow customers’ bills to more accurately reflect their energy consumption. At the same time, the company began installing smart components along their production network to monitor operations in real-time, manage alarms, use predictive maintenance tools, and find leaks using advanced technologies.

The energy company knew this shift would result in a massive amount of data coming into their systems, and they thought they were ready for it. They understood the complexities of managing and leveraging data from the Internet of Things (IoT), such as the high velocity at which data must be ingested and the need for time-based data aggregations. They rolled out an IoT platform with big data and analytics tools to help them make progress toward their objectives of high-quality, efficient, and safe service.

This article looks at how the company migrated their system to MongoDB Atlas in order to handle the massive influx of data.

Managing data

The energy company was managing 3TB of data on Microsoft’s Azure Cosmos DB, with the remainder housed and managed on a relational database. However, they started facing challenges with Cosmos DB, including a lack of scalability, increasing costs, and poor performance. The costs to maintain the pre-production and production environments were also becoming unsustainable. And, the situation wasn’t going to get better: By 2023, the energy company planned to increase the number of IoT devices and sensors by a factor of five, so they knew that Cosmos DB was not a viable solution for the long term.

Migrating to MongoDB Atlas

The energy company decided to migrate to MongoDB Atlas for several reasons. Atlas’ online archive, combined with the ability to create time-series sharded collections, makes Atlas an ideal fit for IoT data, as does the flexibility of the document data model. Additionally, a Cosmos DB-compatible API would minimize the impact on application code and make it easier to migrate applications.

The customer chose PeerIslands to be its technical partner and help them make the migration. PeerIslands, a MongoDB partner, is an enterprise-class digital transformation company with an expert, multilingual team with significant experience working across multiple technologies and cloud platforms.

PeerIslands has developed solutions for both homogenous and heterogenous workload migrations. Among these solutions is a Cosmos to MongoDB tool that helps perform one-time migrations and change data capture while minimizing downtime. The tool is fully GUI-based, and tasks such as infrastructure provisioning, dump and restore, change stream listeners, and processors have all been automated. For change capture, the tool uses the native MongoDB change stream APIs.

Migration challenges

In working with the energy company to perform the migration, the PeerIslands team faced two particular challenges:

  1. The large volume of data. Initial snapshotting of the data would take about one day.

  2. The application had significant write loads. On average, it was writing about 12,000 messages per second. However, the load was unevenly distributed, with spikes when devices would “wake up” and report their status.

These two factors quickly generated close to 20 million change events in Cosmos DB that had to be synced to MongoDB. Meanwhile, new data was constantly being written into the Cosmos DB source.

Cosmos2Atlas tool

PeerIslands’ Cosmos2Atlas tool uses mongodump and mongorestore for one-time data migration and MongoDB Kafka Connector for real-time data synchronization. By using Apache Kafka, the Cosmos2Atlas tool was able to handle the large amount of change stream data and successfully manage the migration.

To address the complexity of the migration, PeerIslands also enhanced the Cosmos2Atlas tool with additional capabilities:

  1. Parallelize the Kafka change stream processing using partitions. The Kafka partitioning strategy was in sync with the target Atlas sharding strategy.

  2. Use ReplaceOneBusinessKeyStrategy as the write model for Kafka MongoDB sink connector to write into sharded Atlas collections.

By using its in-house Cosmos2Atlas tooling, PeerIslands was able to successfully complete the migration with near-zero downtime.

Improved performance

With the migration complete, the customer has already begun to realize the benefits of MongoDB Atlas for their massive amounts of IoT data. The user interface has become extremely responsive, even in front of more expensive queries.

Because of the improved performance of the database, the customer is now able to pursue improvements and efficiencies in other areas. With better performance, the company expects consumption of the data to rise and their schema design to evolve. They’re looking to leverage the time-series benefits of MongoDB both to simplify their schema design and deliver richer IoT functionality. They’re also better equipped to rapidly respond to and fulfill business needs, because the database is no longer a limitation. Importantly, costs have decreased for the production environment, and even more dramatic cost reductions have been seen for the pre-production environment.