Designing MongoDB Schemas for Large SQL Migrations

Tom Hollander, Rajesh Vinayagam, and Krishnakumar Sathyanarayana

Unlike relational databases, MongoDB provides significant flexibility in designing schemas. MongoDB schema design should take into consideration both the relationship between the data elements and the data access patterns to evolve the most optimal schema for a workload. Designing the right schema for MongoDB is vital to getting the best cost-performance ratio for our workloads, and a bad schema design can massively impact application performance.

Enterprise customers working on SQL migrations typically have hundreds of tables that need to be migrated into MongoDB. The Relational Migrator tool has dramatically simplified schema transformation, data migration, and real-time CDC from SQL databases to MongoDB. Despite that, customers still spend considerable time in MongoDB schema design.

When we move beyond a few tables, designing the right MongoDB schema for SQL migrations brings with it several challenges.

  1. The relationship between the tables is often not codified at the DB level as Primary or foreign key constraints. Rather, this is enforced at the application level. We would need to spend a significant amount of time with the customer SMEs to surface these constraints.
  2. The cardinality of data relationships plays a key role in MongoDB schema design. However, this is challenging to gather in production settings.
  3. Data access patterns - Frequently used tables, tables accessed together, index usage cannot be quickly gathered in an automated way,

Analyzing the existing SQL footprint in an ad hoc manner can become quite cumbersome, and running sub-optimal brute-force queries on the existing database may cause many issues. All these challenges ultimately result in a lot of manual effort for key customer SMEs during the schema design process.

PeerIslands approach

PeerIslands has been working with multiple Fortune 100 customers on SQL migrations and has evolved a pre-migration analysis approach for addressing this problem. The pre-migration analysis helps understand your current SQL footprint and access patterns and provides valuable insights into designing the right schema on MongoDB. The pre-migration analysis is fully automated and optimized for performance.

We, at PeerIslands, recently used this approach for one of the largest US insurance providers. We rapidly analyzed close to 700+ tables and provided schema design recommendations for key migration initiatives they were working on.

Below are some key areas of analysis that are covered by the pre-migration analysis:

Data relationships and cardinality

MongoDB provides significant flexibility for modeling relationships. SQL DB metadata provides rich data on the relationships between the various data elements. Depending on the complexity of the relationships, we can design them in MongoDB using embedded documents or references between documents.

In addition to relationships, the cardinality of the existing data can be gathered from SQL indexes in a performant manner and can help us make a determination of correctly modeling one-to-few, one-to-many, and one-to-zillion relationships on MongoDB.

Query analysis

SQL DBs store the actual queries that have been executed in the environment. We can also obtain the top queries that have been executed. Analyzing the queries helps us understand which data elements are accessed together, read/write metrics, and more. Understanding this will help us have the frequently accessed data elements together on MongoDB.

Query analysis helps us rapidly understand the referential integrity constraints and join patterns that exist in the data. This is helpful in situations where referential integrity is not codified in the DB layer.

Table metadata

Table and DB metadata can provide us with the overall lay of the land - the number of tables, SQL stored procedures, views, size of tables, and most frequently accessed tables, among others. This can help us get a quick understanding of the complexity of the migration.

Indexes

Indexes speed up data access and retrieval, and they can have a significant impact on the performance of the database. Understanding the current indexes in the SQL DBs and their usage provides good inputs for index creation on MongoDB. An understanding of the available indexes can also help perform data migration more efficiently.

Is it possible to get all this information in a performant and automated way in a production setting?

Putting pre-migration analysis into practice with Oracle

We have developed playbooks to extract this information from several SQL databases. Here is an example - below is a quick technical preview of how we can get this information from one of the most common databases that we help migrate from - Oracle.

Oracle data dictionary

The Oracle Data Dictionary refers to a set of read-only tables and views that contain metadata about the database objects, such as tables, columns, indexes, constraints, views, users, and privileges.

By querying these tables and views, users can retrieve detailed information about the structure and contents of the database objects. This information can be used to analyze the database schema, optimize queries, and perform other tasks related to database management and maintenance. We can also leverage the Data dictionary to gather the details we discussed.

Metrics

In Oracle, you can get table metrics such as table size, number of rows, number of blocks, and number of empty blocks using the DBA_TABLES view. Below is an example query to get the table metrics for all tables in a schema:

SELECT owner, table_name, num_rows, blocks, empty_blocks, avg_row_len, chain_cnt, avg_space
FROM dba_tables
WHERE owner = 'ADMIN'
ORDER BY NUM_ROWS DESC;

Indexes

To get all the indexes for a given schema in Oracle, we can use the DBA_INDEXES view. Here's an example query:

SELECT owner, index_name, table_name, uniqueness, index_type, status
FROM dba_indexes
WHERE owner = 'ADMIN'
ORDER BY table_name, index_name;

Relationships

To get the one-to-many relationships of all the tables in Oracle, we can query the data dictionary views ALL_CONSTRAINTS and ALL_CONS_COLUMNS.

SELECT uc.table_name, rc.table_name AS referenced_table_name, ucc.column_name, uc.constraint_name,
rc.constraint_name AS referenced_constraint_name
FROM all_constraints uc
JOIN all_cons_columns ucc ON uc.owner = ucc.owner AND uc.constraint_name = ucc.constraint_name
JOIN all_constraints rc ON uc.r_owner = rc.owner AND uc.r_constraint_name = rc.constraint_name
WHERE uc.constraint_type = 'R' AND uc.owner = 'ADMIN'
ORDER BY uc.table_name, uc.constraint_name, ucc.position;

Variations of the above query can help in getting one-to-one relationships and many-to-many relationships.

Queries list

To get a list of queries executed in Oracle, you can query the database’s V$SQL and V$SQLTEXT views.

SELECT *
FROM v$sql
WHERE parsing_schema_name = 'ADMIN'
ORDER BY last_active_time DESC;

This query will retrieve all SQL statements executed by the specified schema and order them by the last time they were active.

Top 10 queries

The same v$sql tables can provide the Top 10 queries as well.

SELECT sql_text, executions
FROM v$sql
ORDER BY executions DESC
FETCH FIRST 10 ROWS ONLY;

All of these queries only use Oracle metadata. The queries are highly performant and do not negatively impact instance performance. A similar approach can be used to gather this information for other SQL DBs like MySQL and MS SQL.

Conclusion

PeerIslands has been working with multiple customers on SQL migration projects where we handle end-to-end services, including schema design, data migration, app refactoring, SQL procedure conversion, and more. Moreover, we have also been an engineering partner for MongoDB Relational Migrator and have worked closely with the team on multiple initiatives over the last two years.

We are working with the Relational Migrator team to fully automate pre-migration analysis for various SQL databases like Oracle, MySQL, MS SQL, Postgres, etc., and surface these insights to users as part of the Relational Migrator tooling. This will help save time for users and also prevent incorrect schema designs. Follow this space for more announcements on this topic.

With extensive experience handling multiple SQL engagements in partnership with MongoDB's Professional Services team, a rich repository of tools, and close collaboration with Relational Migrator, PeerIslands is your "Partner" to accelerate your SQL migration journey. Reach out to partners@Mongodb.com to learn more.