Migrating Transactional Data to MongoDB in AWS with the Zaloni Arena Data Governance Platform

Moa Passador

Embarking on a cloud migration journey is not always an easy task, and it is crucial to lay out a plan with goals and steps at each step along the way.

As Director of Solutions Engineering at Zaloni, I recently had the opportunity to work with one of our customers to plan and manage its cloud migration project. The project included migrating the customer's transaction application and data infrastructure to the cloud.

What is the goal?

When the decision is made to migrate to the cloud, the most critical aspect before beginning is to determine what it is you’re hoping to achieve.

In this case, the customer was undergoing a digital transformation and faced difficulty scaling its data environment at a rate that matched the daily ingestion of live data through its servers. The primary objective was to improve its scalability by migrating its entire transaction application to the cloud. Zaloni then mapped out the steps needed to move all transactional data from the customer’s on-premise environment to the cloud. The migration of transactional data was the key deliverable for Zaloni, ultimately achieving the goal of moving the entire on-premise SQL Server database to MongoDB in AWS.

What were the challenges?

Once you’ve decided upon the goals from a cloud migration, the next step is to ensure you’re aligned with what potential challenges you will face throughout the process.

The company needed to migrate 4TB of data from its current on-premise relational database to an AWS cloud lake and finally to MongoDB Atlas. However, the complexity of migrating high volumes of data efficiently from its relational database to AWS and MongoDB presented its fair share of challenges.

The first challenge was to recognize the potential data complexities due to the customer’s global customer base. It was imperative that we were very mindful of the different UTF code standards in each country. By being thorough, Zaloni ensured data quality and compliance for the customer across the board. Secondly, the customer’s on-premise environment had a reasonably large SQL server instance still in use, which made things much harder to move stored historical data to the cloud since they had new data coming in every second.

Data validation was another challenging aspect of this migration project because the size and type of data varied greatly. The different data types and sizes lengthened the data validation and data quality process when moving all historical and incremental data to the cloud. During the project’s implementation, our team wrote code using different languages, including Python, Shell, Java, and a few others. To make the best use of those tools our team had to offer, we utilized the Zaloni Arena Platform to code, build and orchestrate the data validation process.

How did we overcome the challenges?

Our team of data experts ran data quality and validated both historical and live data using specific business rules and improved the process of ingesting and migrating data. With these adjustments, Zaloni established a cloud-based data environment that could operate without human interaction and allowed the team to shift their focus on the remaining application migration.

The Zaloni Arena platform played an instrumental role in the migration process due to Arena’s automated governance capabilities and overall extensibility. Arena’s data lineage allowed the team to quickly find and fix issues that arose with the data during migration.

To put into perspective the impact we made, within eight days, Zaloni transported the last six months of company transactional data to an AWS-based data lake that served as a middle ground where it could transform close to 70 million documents into the required MongoDB format.

What did we learn?

When I reflect back on this project, it’s evident that creating a data validation process was critical to our success. We learned how to streamline this process moving forward. A vital aspect of this is access to sample data before production starts. By doing so, teams can begin writing various data validation scripts on smaller datasets and see what works best with the data without slowing down the timeline of the migration project.

Another major lesson is moving historical and incremental data into Amazon S3. This step was critical to the success of the cloud migration project and was completed by utilizing the Zaloni Arena platform. With the platform, the Zaloni team moved all transactional data into the cloud, managed incremental data at hand, and provided all the necessary data transformations needed using AWS EMR all through Arena’s unified platform.

The Zaloni Arena Platform and MongoDB

With the help of Zaloni’s data platform, Arena, companies can tackle the cloud migration process with ease, as transitioning data from an on-premise environment to MongoDB Atlas serves as the catalyst to modernizing data architectures and achieving digital transformation.

Arena enables a simplified data migration while providing its users with a single pane of glass view of their data environment to improve data observability, data quality, data governance, and data pipeline management. Ultimately, Arena can instill an abundance of confidence for data citizens knowing data has been validated, cleaned and can be trusted for its accuracy and relevancy. To learn more about how Arena and MongoDB can help you overcome your data challenges, visit https://www.zaloni.com/mongodb/.