Designing MongoDB Schemas for Large SQL Migrations
Unlike relational databases, MongoDB provides significant flexibility in designing schemas. MongoDB schema design should take into consideration both the relationship between the data elements and the data access patterns to evolve the most optimal schema for a workload. Designing the right schema for MongoDB is vital to getting the best cost-performance ratio for our workloads, and a bad schema design can massively impact application performance. Enterprise customers working on SQL migrations typically have hundreds of tables that need to be migrated into MongoDB. The Relational Migrator tool has dramatically simplified schema transformation, data migration, and real-time CDC from SQL databases to MongoDB. Despite that, customers still spend considerable time in MongoDB schema design. When we move beyond a few tables, designing the right MongoDB schema for SQL migrations brings with it several challenges. The relationship between the tables is often not codified at the DB level as Primary or foreign key constraints. Rather, this is enforced at the application level. We would need to spend a significant amount of time with the customer SMEs to surface these constraints. The cardinality of data relationships plays a key role in MongoDB schema design. However, this is challenging to gather in production settings. Data access patterns - Frequently used tables, tables accessed together, index usage cannot be quickly gathered in an automated way, Analyzing the existing SQL footprint in an ad hoc manner can become quite cumbersome, and running sub-optimal brute-force queries on the existing database may cause many issues. All these challenges ultimately result in a lot of manual effort for key customer SMEs during the schema design process. PeerIslands approach PeerIslands has been working with multiple Fortune 100 customers on SQL migrations and has evolved a pre-migration analysis approach for addressing this problem. The pre-migration analysis helps understand your current SQL footprint and access patterns and provides valuable insights into designing the right schema on MongoDB. The pre-migration analysis is fully automated and optimized for performance. We, at PeerIslands, recently used this approach for one of the largest US insurance providers. We rapidly analyzed close to 700+ tables and provided schema design recommendations for key migration initiatives they were working on. Below are some key areas of analysis that are covered by the pre-migration analysis: Data relationships and cardinality MongoDB provides significant flexibility for modeling relationships. SQL DB metadata provides rich data on the relationships between the various data elements. Depending on the complexity of the relationships, we can design them in MongoDB using embedded documents or references between documents. In addition to relationships, the cardinality of the existing data can be gathered from SQL indexes in a performant manner and can help us make a determination of correctly modeling one-to-few, one-to-many, and one-to-zillion relationships on MongoDB. Query analysis SQL DBs store the actual queries that have been executed in the environment. We can also obtain the top queries that have been executed. Analyzing the queries helps us understand which data elements are accessed together, read/write metrics, and more. Understanding this will help us have the frequently accessed data elements together on MongoDB. Query analysis helps us rapidly understand the referential integrity constraints and join patterns that exist in the data. This is helpful in situations where referential integrity is not codified in the DB layer. Table metadata Table and DB metadata can provide us with the overall lay of the land - the number of tables, SQL stored procedures, views, size of tables, and most frequently accessed tables, among others. This can help us get a quick understanding of the complexity of the migration. Indexes Indexes speed up data access and retrieval, and they can have a significant impact on the performance of the database. Understanding the current indexes in the SQL DBs and their usage provides good inputs for index creation on MongoDB. An understanding of the available indexes can also help perform data migration more efficiently. Is it possible to get all this information in a performant and automated way in a production setting? Putting pre-migration analysis into practice with Oracle We have developed playbooks to extract this information from several SQL databases. Here is an example - below is a quick technical preview of how we can get this information from one of the most common databases that we help migrate from - Oracle. Oracle data dictionary The Oracle Data Dictionary refers to a set of read-only tables and views that contain metadata about the database objects, such as tables, columns, indexes, constraints, views, users, and privileges. By querying these tables and views, users can retrieve detailed information about the structure and contents of the database objects. This information can be used to analyze the database schema, optimize queries, and perform other tasks related to database management and maintenance. We can also leverage the Data dictionary to gather the details we discussed. Metrics In Oracle, you can get table metrics such as table size, number of rows, number of blocks, and number of empty blocks using the DBA_TABLES view. Below is an example query to get the table metrics for all tables in a schema: SELECT owner, table_name, num_rows, blocks, empty_blocks, avg_row_len, chain_cnt, avg_space FROM dba_tables WHERE owner = 'ADMIN' ORDER BY NUM_ROWS DESC; Indexes To get all the indexes for a given schema in Oracle, we can use the DBA_INDEXES view. Here's an example query: SELECT owner, index_name, table_name, uniqueness, index_type, status FROM dba_indexes WHERE owner = 'ADMIN' ORDER BY table_name, index_name; Relationships To get the one-to-many relationships of all the tables in Oracle, we can query the data dictionary views ALL_CONSTRAINTS and ALL_CONS_COLUMNS. SELECT uc.table_name, rc.table_name AS referenced_table_name, ucc.column_name, uc.constraint_name, rc.constraint_name AS referenced_constraint_name FROM all_constraints uc JOIN all_cons_columns ucc ON uc.owner = ucc.owner AND uc.constraint_name = ucc.constraint_name JOIN all_constraints rc ON uc.r_owner = rc.owner AND uc.r_constraint_name = rc.constraint_name WHERE uc.constraint_type = 'R' AND uc.owner = 'ADMIN' ORDER BY uc.table_name, uc.constraint_name, ucc.position; Variations of the above query can help in getting one-to-one relationships and many-to-many relationships. Queries list To get a list of queries executed in Oracle, you can query the database’s V$SQL and V$SQLTEXT views. SELECT * FROM v$sql WHERE parsing_schema_name = 'ADMIN' ORDER BY last_active_time DESC; This query will retrieve all SQL statements executed by the specified schema and order them by the last time they were active. Top 10 queries The same v$sql tables can provide the Top 10 queries as well. SELECT sql_text, executions FROM v$sql ORDER BY executions DESC FETCH FIRST 10 ROWS ONLY; All of these queries only use Oracle metadata. The queries are highly performant and do not negatively impact instance performance. A similar approach can be used to gather this information for other SQL DBs like MySQL and MS SQL. Conclusion PeerIslands has been working with multiple customers on SQL migration projects where we handle end-to-end services, including schema design, data migration, app refactoring, SQL procedure conversion, and more. Moreover, we have also been an engineering partner for MongoDB Relational Migrator and have worked closely with the team on multiple initiatives over the last two years. We are working with the Relational Migrator team to fully automate pre-migration analysis for various SQL databases like Oracle, MySQL, MS SQL, Postgres, etc., and surface these insights to users as part of the Relational Migrator tooling. This will help save time for users and also prevent incorrect schema designs. Follow this space for more announcements on this topic. With extensive experience handling multiple SQL engagements in partnership with MongoDB's Professional Services team , a rich repository of tools, and close collaboration with Relational Migrator, PeerIslands is your "Partner" to accelerate your SQL migration journey. Reach out to partners@Mongodb.com to learn more.
Migrating Terabytes of IoT Data From a Legacy NoSQL Database to MongoDB Atlas With MongoDB's Custom Migration Tool
In 2020, a large European energy company began an ambitious plan to replace its traditional metering devices — all 7.6 million of them — with smart meters. That would allow the energy company to monitor gas use remotely and allow customers’ bills to more accurately reflect their energy consumption. At the same time, the company began installing smart components along their production network to monitor operations in real-time, manage alarms, use predictive maintenance tools, and find leaks using advanced technologies. The energy company knew this shift would result in a massive amount of data coming into their systems, and they thought they were ready for it. They understood the complexities of managing and leveraging data from the Internet of Things (IoT), such as the high velocity at which data must be ingested and the need for time-based data aggregations. They rolled out an IoT platform with big data and analytics tools to help them make progress toward their objectives of high-quality, efficient, and safe service. This article examines how the company migrated its system to MongoDB Atlas in order to handle the massive influx of data. Managing data The energy company was managing 3 TB of data on its NoSQL database, with the remainder housed and managed on a relational database. However, it started facing challenges, including a lack of scalability, increasing costs, and poor performance. The costs to maintain the pre-production and production environments were unsustainable, and the situation wasn’t going to get better: By 2023, the energy company planned to increase the number of IoT devices and sensors by a factor of five. They needed a viable solution for the long term. Migrating to MongoDB Atlas The energy company decided to migrate to MongoDB Atlas for several reasons. Atlas’s online archive, combined with the ability to create time-series sharded collections, makes Atlas an ideal fit for IoT data, as does the flexibility of the document data model. Additionally, an API that was compatible with the existing database would minimize the impact on application code and make it easier to migrate applications. The customer chose PeerIslands to be its technical partner and help them with the migration. PeerIslands, a MongoDB partner, is an enterprise-class digital transformation company with an expert, multilingual team with significant experience working across multiple technologies and cloud platforms. PeerIslands has developed solutions for both homogenous and heterogenous workload migrations. Among these solutions is a MongoDB migration tool that helps perform one-time migrations and change data capture while minimizing downtime. The tool is fully GUI-based, and tasks such as infrastructure provisioning, dump and restore, change stream listeners, and processors have all been automated. For change capture, the tool uses the native MongoDB change stream APIs. Migration challenges In working with the energy company to perform the migration, the PeerIslands team faced two particular challenges: The large volume of data. Initial snap-shotting of the data would take about a day. The application had significant write loads. On average, it was writing about 12,000 messages per second. However, the load was unevenly distributed, with spikes when devices would “wake up” and report their status. These two factors quickly generated close to 20 million change events that had to be synced to MongoDB. Meanwhile, new data was constantly being written into the source. Migration tool PeerIslands’ migration tool uses mongodump and mongorestore for one-time data migration and MongoDB Kafka Connector for real-time data synchronization. By using Apache Kafka, the migration tool was able to handle the large amount of change stream data and successfully manage the migration. To address the complexity of the migration, PeerIslands also enhanced the migration tool with additional capabilities: Parallelize the Kafka change stream processing using partitions. The Kafka partitioning strategy was in sync with the target Atlas sharding strategy. Use ReplaceOneBusinessKeyStrategy as the write model for Kafka MongoDB sink connector to write into sharded Atlas collections. By using its in-house tooling, PeerIslands was able to successfully complete the migration with near zero downtime. Improved performance With the migration complete, the customer has already begun to realize the benefits of MongoDB Atlas for their massive amounts of IoT data. The user interface has become extremely responsive, even in front of more expensive queries. Because of the improved performance of the database, the customer is now able to pursue improvements and efficiencies in other areas. With better performance, the company expects consumption of the data to rise and their schema design to evolve. They’re looking to leverage the time-series benefits of MongoDB both to simplify their schema design and deliver richer IoT functionality. They’re also better equipped to rapidly respond to and fulfill business needs, because the database is no longer a limitation. Importantly, costs have decreased for the production environment, and even more dramatic cost reductions have been seen for the pre-production environment. Learn more about the migration tool and MongoDB’s time series capabilities . Interested in your own data migration? Contact us .
PeerIslands Cosmos DB Migrator Tool to MongoDB Atlas on Google Cloud
When you’re in the midst of innovating, the last thing you want to worry about is infrastructure. Whether you’re looking to streamline inventory management or reimagine marketing, you need applications that can scale fast and maintain high availability. That’s where MongoDB Atlas on Google Cloud comes in. With MongoDB Atlas’ general-purpose, document-based database, users can free themselves from the hassle of database management, and give back precious time to developers to focus on innovation. Combine these benefits with Google Cloud’s cloud computing power, high availability, and ability to integrate with tools like BigQuery, Dataflow, Dataproc and more, and it’s hard to find a comparable joint solution. In fact, many current Microsoft Azure Cosmos DB users are now considering making the move to MongoDB. Microsoft’s Cosmos DB only supports single partition transactions, has no schema governance and forces developers to work with five different APIs to deliver full application functionality. Conversely, MongoDB Atlas on Google Cloud supports distributed multi-document ACID transactions, includes schema governance, and offers integrated full-text search, auto-archiving, data lakes, and edge-to-cloud data sync. The following blog illustrates how PeerIslands’ Cosmos DB Migrator tool can help users move from Cosmos DB to MongoDB Atlas on Google Cloud. Why PeerIslands PeerIslands is an enterprise-class digital transformation company composed of a team of polyglots who are comfortable across multiple technologies and cloud platforms. As a services firm, PeerIslands is focused on helping customers with both cloud-native development and application transformation. With best-in-the-industry talent, PeerIslands has been working with the MongoDB team to build a suite of solutions around two key objectives: For a customer evaluating MongoDB, how can we rapidly address common questions? Once a customer has chosen MongoDB, how can we reduce time to value by rapidly migrating workloads to MongoDB? With this in mind, PeerIslands developed a suite of tools around schema generation, understanding MongoDB query performance, as well as helping customers understand code changes required for upgrading MongoDB versions. In terms of workload migrations, PeerIslands developed solutions for both homogenous and heterogenous migrations. The company is also contributing to the open source community with a mobile app for enabling MongoDB admins to manage Atlas on the go. PeerIslands' Cosmos DB migration use case The current approach for migrating data from Cosmos DB to MongoDB is to use MongoDB dump and restore. But there are several problems with this approach. It’s fully manual and CLI-based which creates a poor user experience and requires technical resources even for simple migrations. There’s a lack of change capture capability which requires downtime during the duration of migration. For large Cosmos DB migrations, this causes significant issues. The team is also under pressure to deliver the entire migration in a short period of time. Migrations often get delayed as customers have difficulty identifying the right migration window. The Cosmos to MongoDB tool is a “Live Migrate” like tool that helps perform one-time migrations and change data capture from Cosmos DB (MongoDB model) to MongoDB Atlas and minimizes downtime requirements associated with migrations. The tool is fully GUI-based and nearly everything is automated. All the tasks for infrastructure provisioning, dump & restore, change stream listeners and processors have all been automated with a graphical user interface (GUI). The Cosmos to Mongo migration tool uses native MongoDB tools and the performance is similar to native tools. For change capture, we leverage the native MongoDB change stream APIs. A high level view of the solution is provided in figure 1 below: Figure 1: Solution Map Migration steps: Migration configuration: Provide the name of the migration task, source Cosmos DB details, and target MongoDB details. The tool supports key vault integration as well. Migration infrastructure provisioning: Provide migration infrastructure details required for creating the VM (Virtual Machine) including location, type of VM instance, etc. Migration execution: Allow for automation of the migration once the configuration is complete. The migration is executed in 3 steps: backup, restore and change event processing. As a user, you can initiate the backup process. The change event listener is started in parallel with the backup process and captures all the changes. Once the backup is complete, the user can restore the initial data and then perform change event processing to apply all the changes to MongoDB. Migration validation: The tool also provides facilities for validating the migration. Users can view the total number of documents on both source Cosmos DB collection and target MongoDB collection. They can also compare random documents picked up from Cosmos DB and MongoDB side by side and validate whether the data elements have been loaded correctly. For a more detailed demo and description of events, watch the following video: Migrating to a new database can feel daunting at first, but PeerIslands Cosmos DB migrator makes it easy. Major concerns like delays and downtime are eliminated from the process, helping you run your business smoothly and reap the benefits of MongoDB more quickly. And with PeerIslands suite of tools, you can rapidly address MongoDB-specific questions and accelerate time to value. Reach out today to get started
Make Migrating to MongoDB Atlas on AWS Easy with PeerIslands Modernization Tool Set
As cloud computing becomes commonplace across industries, organizations are rapidly adopting MongoDB Atlas because they know that true modernization is about more than just moving data as-is to the cloud—i.e. taking a “lift and shift” approach. It’s also about remodeling that same data along the way for faster and more iterative development. With MongoDB’s document-based database, developers are empowered to reimagine how they build with flexible schema design that allows them to easily model and remodel data for a wide range of use cases, while still applying governance when needed. MongoDB Atlas maps naturally to modern object-oriented programming languages, making developers' lives much easier. In contrast to the rigidity of SQL databases, MongoDB’s flexible data model means that your database schema can evolve with business requirements. This helps users build applications faster, handle diverse data types and manage applications more efficiently at scale. As a fully-managed service, MongoDB Atlas takes care of database maintenance for you and can also be scaled within and across multiple distributed data centers, providing new levels of availability and scalability previously unachievable with relational databases. The advantages of moving to MongoDB Atlas are clear, but some companies may still feel reluctant to leave behind the legacy relational databases they’re familiar with for unknown territory. This is where PeerIslands comes in. With PeerIslands, you don’t have to go it alone. The following blog introduces PeerIslands’ modernization capabilities, and how you can leverage them to migrate seamlessly to MongoDB Atlas on AWS. Why PeerIslands? PeerIslands is an enterprise-class digital transformation company composed of a team of polyglots who are comfortable across multiple technologies and cloud platforms. As a services firm, PeerIslands is focused on helping customers with both cloud-native development, and applications transformation. With best-in-the-industry talent, the team has helped several Fortune 50 companies bring large-scale transformations to life, and has received recognition from several clients and partners, including MongoDB. With engineers trained and certified in MongoDB, PeerIslands has helped MongoDB’s ISV and retail customers modernize, moving software built for on-prem to SaaS environments more conducive to cloud environments, and was named MongoDB’s Boutique System Integrator Partner of the Year . PeerIslands can swiftly transform and migrate core, legacy, and on-premises applications to the cloud. They develop solutions based on cutting-edge microservices and serverless architecture across public cloud platforms and hybrid PaaS platforms to help users quickly get applications to customers and business users. How PeerIslands can help PeerIslands has been working with MongoDB and AWS to develop tools that address two key objectives for customers: Objective 1: Tools that address common customer questions when evaluating MongoDB MongoDB Test Data Generator: A fully UI-driven tool with an extensive data library for rapidly loading MongoDB with use-case specific, near real-world data at scale MongoDB Performance Testing tool: A performance analyzer where you can create multiple load profiles, run-use case specific MongoDB queries and understand the performance of the queries. With the test data generator and the performance testing tool, customers can get a clear view of the performance of MongoDB for their specific situation even before migrating to MongoDB MongoDB Schema Generator and Data Modeler: SchemaGen tool helps to rapidly generate draft JSON schema from your existing SQL schema. On top of this, you can then perform the data modeling exercise and generate schema to form your MongoDB schema. The schema generator also provides key information about the SQL DB like size, index, and more MongoDB Sizer: MongoDB sizing tool helps you understand the size implications of your schema and calculate Atlas sizing. With the MongoDB sizer, customers can upload their own schema and calculate the various factors that influence the Atlas sizing Codescanner: A tool for scanning your code repositories for deprecated MongoDB APIs. With the code scanner, customers can get a clear view of the application impact for upgrading MongoDB versions Objective 2: Tools that accelerate time to value by rapidly moving workloads to MongoDB COSMOS2Atlas migration: A point-and-click solution that helps COSMOS customers migrate data from COSMOS to MongoDB. This solution provides change capture capability to ease downtime requirements and makes data migration easy and seamless 1Data: A tool for addressing more complex requirements of migrating data from SQL to MongoDB Admin mobile app: A mobile app for admins to track key Atlas KPIs and approve common access requests on the go PeerIslands brings to the table an entire suite of tools for addressing all your MongoDB needs. PeerIslands use-case featuring 1Data tool One of the key requirements of modernization projects is to solve large-scale data migrations from SQL databases. There are a number of tools that are available which simply replicate data from SQL to MongoDB—but, we rarely use the same SQL schema in MongoDB. Schema transformation—however difficult to do at scale—is nonetheless required so that we can make the best use of MongoDB capabilities. Today, the typical approach is to run custom Spark jobs as they are scalable and flexible when it comes to processing schema transformations and loading the data into MongoDB. But when you go beyond migrating one or two tables in a Proof of concept (PoC) setting, the problem becomes much more complex. For instance, writing custom Spark programs for every schema transformation is cumbersome and error-prone. For even simple migrations we will have tens of Spark programs. Any defects that occur during transformation are going to cause significant issues. Also consider the following challenges: How do you extract data out of your SQL database without impacting database performance? How do you handle infrastructure provisioning and scaling? How do you orchestrate the migration? Few master tables can be migrated once but transaction tables may need both one-time migration and a daily incremental migration. How can you do this orchestration at scale? How do you know whether you have not lost data during migration? Last but not the least, once a data is migrated how do you keep it up to date? We will probably end up with a suite of tools to address these issues–SQOOP, Kafka, Spark, some kind of a job orchestration engine, an observability suite, notification workflow and so on. It will quickly become evident that migrating data from SQL to MongoDB without disrupting business could be the most daunting barrier to adopting MongoDB. Unfortunately, current tools invariably fail for complex heterogeneous migration scenarios and developers end up writing a lot of custom code. Realizing this issue, PeerIslands has been working with MongoDB and AWS to develop 1Data. 1Data is a platform that helps enterprises perform migration and real time synchronization of data between SQL databases and MongoDB. 1Data is designed to complement existing AWS services like DMS in migrating data out of SQL. Key features of 1Data: Data is fully GUI based — There is no coding required 1Data provides a single platform for both one-time migration and continuous updates 1Data is consistent across one-time migration and continuous updates. This provides a good anti-corruption layer for continuous updates The tech stack of 1Data is based on Spark, Kafka among others and is highly scalable 1Data is highly modular and has a well defined API layer. 1Data can be easily extended to your needs 1Data automatically handles all the infrastructure required for migration with AWS quick start templates High Level Solution Architecture 1Data capabilities are realized through a decoupled and highly scalable architecture. The data extract, transformation and load part are independent of each other and can easily be customized based on the specific requirements of the customer. The architecture can orchestrate between batch-based initial loads and streaming-based CDC loads. A Spark, Kafka, and Airflow-based tech stack provides excellent scalability for the 1Data platform to handle large data migrations. Figure 1: 1Data High Level solution architecture OneData Portal structures migrations using Endpoints, Tasks and DAGs (Directed Acyclic Graphs) Endpoints define source, intermediate and final data locations and can come in the form of files, databases or queues. Endpoints can also be database extracts in S3 from AWS DMS service. Task definition is the second step in the migration. Tasks act on source point and produce data in either staging or destination end point. There are a number of predefined tasks available:Extract, Transformation, Sink and Validation tasks. You can configure both streaming and batch tasks. Defining the DAGs is the final step before actual migration. DAGs are used to define the sequence in which a user wants to execute the defined tasks. The technology components used in 1Data allows for easily handling very large data migrations. Each of the components has been selected such that they can be deployed across multiple cloud platforms and can be scaled easily. Technology Stack details below: Web Portal: Angular WebAPI: Node Configuration Database: MongoDB Data Transformation & Validation: Spark Data Extraction: Sqoop, Spark, DMS Change Data Capture: Kafka, Debezium Data Sink: Spark Job/Task Orchestrator: Airflow PeerIslands has worked with AWS and MongoDB to create a Quick Start for 1Data. With Quick Start, customers can rapidly instantiate 1Data for their migration requirements. To recap, with 1Data Quick start on AWS, we can Perform heterogeneous schema transformation from SQL and load data into MongoDB Atlas on AWS Weave together continuous data updates, incremental data updates and one-time migration using a combination of batch and streaming jobs Orchestrate the migrations tasks Validate the migration ...And all without writing a single line of code! Demo Looking forward A modern, data architecture can help you unlock your business’ full potential, and gain real-time access to the insights you need, when you need them. MongoDB’s document-based database and flexible schema design help you make smarter decisions, cut costs, and take full advantage of AI/ML capabilities to empower your employees and raise customer satisfaction. The decision to migrate off your legacy systems and onto MongoDB is easy—and now the process is, too. Let PeerIslands help you get there. Our best-in-class teams leverage next-generation technologies, including Artificial Intelligence (AI), Augmented Reality (AR), Blockchain, Internet of Things (IoT), Machine Learning (ML), Mobile, and Virtual Reality (VR). Our expertise spans the modern programming stack, and we follow best practices in distributed, agile, and lean principles as well as test-driven development and DevOp. Additional Resources ISV WMP Program Contact email@example.com for details Atlas Quick Start MongoDB Atlas Starter Package Atlas Migration Guide Atlas Migration Pattern Contact us with any questions around modernization with MongoDB, AWS, and PeerIslands.