Migrating Terabytes of IoT Data From a Legacy NoSQL Database to MongoDB Atlas With MongoDB's Custom Migration Tool
In 2020, a large European energy company began an ambitious plan to replace its traditional metering devices — all 7.6 million of them — with smart meters. That would allow the energy company to monitor gas use remotely and allow customers’ bills to more accurately reflect their energy consumption. At the same time, the company began installing smart components along their production network to monitor operations in real-time, manage alarms, use predictive maintenance tools, and find leaks using advanced technologies. The energy company knew this shift would result in a massive amount of data coming into their systems, and they thought they were ready for it. They understood the complexities of managing and leveraging data from the Internet of Things (IoT), such as the high velocity at which data must be ingested and the need for time-based data aggregations. They rolled out an IoT platform with big data and analytics tools to help them make progress toward their objectives of high-quality, efficient, and safe service. This article examines how the company migrated its system to MongoDB Atlas in order to handle the massive influx of data. Managing data The energy company was managing 3 TB of data on its NoSQL database, with the remainder housed and managed on a relational database. However, it started facing challenges, including a lack of scalability, increasing costs, and poor performance. The costs to maintain the pre-production and production environments were unsustainable, and the situation wasn’t going to get better: By 2023, the energy company planned to increase the number of IoT devices and sensors by a factor of five. They needed a viable solution for the long term. Migrating to MongoDB Atlas The energy company decided to migrate to MongoDB Atlas for several reasons. Atlas’s online archive, combined with the ability to create time-series sharded collections, makes Atlas an ideal fit for IoT data, as does the flexibility of the document data model. Additionally, an API that was compatible with the existing database would minimize the impact on application code and make it easier to migrate applications. The customer chose PeerIslands to be its technical partner and help them with the migration. PeerIslands, a MongoDB partner, is an enterprise-class digital transformation company with an expert, multilingual team with significant experience working across multiple technologies and cloud platforms. PeerIslands has developed solutions for both homogenous and heterogenous workload migrations. Among these solutions is a MongoDB migration tool that helps perform one-time migrations and change data capture while minimizing downtime. The tool is fully GUI-based, and tasks such as infrastructure provisioning, dump and restore, change stream listeners, and processors have all been automated. For change capture, the tool uses the native MongoDB change stream APIs. Migration challenges In working with the energy company to perform the migration, the PeerIslands team faced two particular challenges: The large volume of data. Initial snap-shotting of the data would take about a day. The application had significant write loads. On average, it was writing about 12,000 messages per second. However, the load was unevenly distributed, with spikes when devices would “wake up” and report their status. These two factors quickly generated close to 20 million change events that had to be synced to MongoDB. Meanwhile, new data was constantly being written into the source. Migration tool PeerIslands’ migration tool uses mongodump and mongorestore for one-time data migration and MongoDB Kafka Connector for real-time data synchronization. By using Apache Kafka, the migration tool was able to handle the large amount of change stream data and successfully manage the migration. To address the complexity of the migration, PeerIslands also enhanced the migration tool with additional capabilities: Parallelize the Kafka change stream processing using partitions. The Kafka partitioning strategy was in sync with the target Atlas sharding strategy. Use ReplaceOneBusinessKeyStrategy as the write model for Kafka MongoDB sink connector to write into sharded Atlas collections. By using its in-house tooling, PeerIslands was able to successfully complete the migration with near zero downtime. Improved performance With the migration complete, the customer has already begun to realize the benefits of MongoDB Atlas for their massive amounts of IoT data. The user interface has become extremely responsive, even in front of more expensive queries. Because of the improved performance of the database, the customer is now able to pursue improvements and efficiencies in other areas. With better performance, the company expects consumption of the data to rise and their schema design to evolve. They’re looking to leverage the time-series benefits of MongoDB both to simplify their schema design and deliver richer IoT functionality. They’re also better equipped to rapidly respond to and fulfill business needs, because the database is no longer a limitation. Importantly, costs have decreased for the production environment, and even more dramatic cost reductions have been seen for the pre-production environment. Learn more about the migration tool and MongoDB’s time series capabilities . Interested in your own data migration? Contact us .
Faster Migrations to MongoDB Atlas on Google Cloud with migVisor by EPAM
As the needs of Google Cloud customers evolve and shift towards new user expectations, more and more customers are choosing MongoDB as an ideal alternative to legacy databases. Together, we’ve partnered with users looking to digitize and grow their businesses (such as Forbes ), or meet increased demand due to COVID (such as our work with Boxed , the online grocer) by scaling up infrastructure and data processing within a condensed time frame. As a fully-managed service within the Google Cloud Marketplace , MongoDB Atlas enables our joint customers to quickly deploy applications on Google Cloud with a unified user experience and an integrated billing model. Migrations to managed cloud database services vary in complexity, but even under the most straightforward circumstances, careful evaluation and planning is required. Customer database environments often leverage database technologies from multiple vendors, across different versions, and can run into thousands of deployments. This makes manual assessment cumbersome and error prone. This is where EPAM Systems , a provider with strategic specialization in database and application modernization solutions, comes in. EPAM’s database migration assessment tool, migVisor , is a first-of-its-kind cloud database migration assessment product that helps companies analyze database workloads, configuration, and structure to generate a visual cloud migration roadmap that identifies potential quick wins as well as challenge areas. migVisor identifies the best migration path for databases using sophisticated scoring logic to rank the complexity of migrating them to a cloud-centric technology stack. Previously applicable only to migrations from RDBMS to cloud-based RDBMS, migVisor is now available for MongoDB to MongoDB Atlas migrations. migVisor helps you: Analyze migration decisions objectively by providing a secure assessment of source and target databases that’s independent of deployed environments Accelerate time to migration by automating the discovery and assessment process, which reduces development cycles from a few weeks to a few days Easily understand tech insights by providing a visual overview of your entire journey, enabling better planning and improving stakeholder visibility Reduce database licensing costs by giving you intelligent insights on the target environment and recommended migration paths Key features of migVisor for MongoDB For several years, migVisor by EPAM has delivered automated assessments that have helped hundreds of customers migrate their relational databases to cloud-based or cloud-native databases. Now, migVisor adds support for the world’s leading modern data platform: MongoDB. As part of the initial release, migVisor will support self-managed MongoDB to MongoDB Atlas migration assessments. We plan to support TCO for MongoDB migrations, application modernization, migration assessment, and relational MongoDB migration assessments in future releases. MongoDB is also a natural fit for Google Cloud’s Open Cloud strategy of providing customers a broad set of fully managed database services, as Google Cloud's own GM and VP of Engineering & Databases, Andi Gutmans, notes: We are always looking for ways to simplify migrations for our customers. Now, with EPAM's database migration assessment tool, migVisor, supporting MongoDB Atlas, our customers can easily complete database assessments—including TCO analyses and migration complexity assessments, and generate comprehensive migration plans. A simplified migration experience combined with our joint Marketplace success enables customers to consolidate their data workloads into the cloud while making the development and procurement process simple—so users can focus more on innovation. How the migVisor assessment works migVisor analyzes source databases (on-prem or in any cloud environment) for migration assessment to a new target. The assessment includes the following steps: The simple-to-use migVisor Metadata Collector (mMC) collects metadata from the source database, including: featureCompatibilityVersion value, journaling status for data bearing nodes, MongoDB storage size used, replica set configuration, and more. Figure 1: mMC GUI Edit Connection Screen On the migVisor Analysis Dashboard you can select the source/target pair (e.g., MongoDB to MongoDB Atlas on Google Cloud). Figure 2: Source and Target Selection In the migVisor console, you can then view the automated assessment output that was created from migVisor’s migration complexity scoring engine, including classification of the migration into high/medium/low complexity and identification of potential migration challenges and incompatibilities. Figure 3: Source Cluster Features Finally, you can also export the assessment output in CSV format for further analysis in your preferred data analysis/reporting tool. Conclusion Together, Google Cloud and MongoDB have successfully worked with many organizations to streamline cloud migrations and modernize their legacy landscape. To build on the foundation of providing our customers with the best-in-class experience, we’ve closely worked with Google Cloud and EPAM Systems to integrate MongoDB Atlas with migVisor. Because of this, customers will now be able to better plan migrations, reduce risk and avoid missteps, identify quick wins for TCO reduction, review migration complexities, and appropriately plan migration phases for the best outcomes. Learn more about how you can deploy, manage, and grow MongoDB on Google Cloud on our partner page . If you’d like guidance and migration advice, please reach out to email@example.com to get in touch with the Google, MongoDB, and EPAM Sales teams.
PeerIslands Cosmos DB Migrator Tool to MongoDB Atlas on Google Cloud
When you’re in the midst of innovating, the last thing you want to worry about is infrastructure. Whether you’re looking to streamline inventory management or reimagine marketing, you need applications that can scale fast and maintain high availability. That’s where MongoDB Atlas on Google Cloud comes in. With MongoDB Atlas’ general-purpose, document-based database, users can free themselves from the hassle of database management, and give back precious time to developers to focus on innovation. Combine these benefits with Google Cloud’s cloud computing power, high availability, and ability to integrate with tools like BigQuery, Dataflow, Dataproc and more, and it’s hard to find a comparable joint solution. In fact, many current Microsoft Azure Cosmos DB users are now considering making the move to MongoDB. Microsoft’s Cosmos DB only supports single partition transactions, has no schema governance and forces developers to work with five different APIs to deliver full application functionality. Conversely, MongoDB Atlas on Google Cloud supports distributed multi-document ACID transactions, includes schema governance, and offers integrated full-text search, auto-archiving, data lakes, and edge-to-cloud data sync. The following blog illustrates how PeerIslands’ Cosmos DB Migrator tool can help users move from Cosmos DB to MongoDB Atlas on Google Cloud. Why PeerIslands PeerIslands is an enterprise-class digital transformation company composed of a team of polyglots who are comfortable across multiple technologies and cloud platforms. As a services firm, PeerIslands is focused on helping customers with both cloud-native development and application transformation. With best-in-the-industry talent, PeerIslands has been working with the MongoDB team to build a suite of solutions around two key objectives: For a customer evaluating MongoDB, how can we rapidly address common questions? Once a customer has chosen MongoDB, how can we reduce time to value by rapidly migrating workloads to MongoDB? With this in mind, PeerIslands developed a suite of tools around schema generation, understanding MongoDB query performance, as well as helping customers understand code changes required for upgrading MongoDB versions. In terms of workload migrations, PeerIslands developed solutions for both homogenous and heterogenous migrations. The company is also contributing to the open source community with a mobile app for enabling MongoDB admins to manage Atlas on the go. PeerIslands' Cosmos DB migration use case The current approach for migrating data from Cosmos DB to MongoDB is to use MongoDB dump and restore. But there are several problems with this approach. It’s fully manual and CLI-based which creates a poor user experience and requires technical resources even for simple migrations. There’s a lack of change capture capability which requires downtime during the duration of migration. For large Cosmos DB migrations, this causes significant issues. The team is also under pressure to deliver the entire migration in a short period of time. Migrations often get delayed as customers have difficulty identifying the right migration window. The Cosmos to MongoDB tool is a “Live Migrate” like tool that helps perform one-time migrations and change data capture from Cosmos DB (MongoDB model) to MongoDB Atlas and minimizes downtime requirements associated with migrations. The tool is fully GUI-based and nearly everything is automated. All the tasks for infrastructure provisioning, dump & restore, change stream listeners and processors have all been automated with a graphical user interface (GUI). The Cosmos to Mongo migration tool uses native MongoDB tools and the performance is similar to native tools. For change capture, we leverage the native MongoDB change stream APIs. A high level view of the solution is provided in figure 1 below: Figure 1: Solution Map Migration steps: Migration configuration: Provide the name of the migration task, source Cosmos DB details, and target MongoDB details. The tool supports key vault integration as well. Migration infrastructure provisioning: Provide migration infrastructure details required for creating the VM (Virtual Machine) including location, type of VM instance, etc. Migration execution: Allow for automation of the migration once the configuration is complete. The migration is executed in 3 steps: backup, restore and change event processing. As a user, you can initiate the backup process. The change event listener is started in parallel with the backup process and captures all the changes. Once the backup is complete, the user can restore the initial data and then perform change event processing to apply all the changes to MongoDB. Migration validation: The tool also provides facilities for validating the migration. Users can view the total number of documents on both source Cosmos DB collection and target MongoDB collection. They can also compare random documents picked up from Cosmos DB and MongoDB side by side and validate whether the data elements have been loaded correctly. For a more detailed demo and description of events, watch the following video: Migrating to a new database can feel daunting at first, but PeerIslands Cosmos DB migrator makes it easy. Major concerns like delays and downtime are eliminated from the process, helping you run your business smoothly and reap the benefits of MongoDB more quickly. And with PeerIslands suite of tools, you can rapidly address MongoDB-specific questions and accelerate time to value. Reach out today to get started
Make Migrating to MongoDB Atlas on AWS Easy with PeerIslands Modernization Tool Set
As cloud computing becomes commonplace across industries, organizations are rapidly adopting MongoDB Atlas because they know that true modernization is about more than just moving data as-is to the cloud—i.e. taking a “lift and shift” approach. It’s also about remodeling that same data along the way for faster and more iterative development. With MongoDB’s document-based database, developers are empowered to reimagine how they build with flexible schema design that allows them to easily model and remodel data for a wide range of use cases, while still applying governance when needed. MongoDB Atlas maps naturally to modern object-oriented programming languages, making developers' lives much easier. In contrast to the rigidity of SQL databases, MongoDB’s flexible data model means that your database schema can evolve with business requirements. This helps users build applications faster, handle diverse data types and manage applications more efficiently at scale. As a fully-managed service, MongoDB Atlas takes care of database maintenance for you and can also be scaled within and across multiple distributed data centers, providing new levels of availability and scalability previously unachievable with relational databases. The advantages of moving to MongoDB Atlas are clear, but some companies may still feel reluctant to leave behind the legacy relational databases they’re familiar with for unknown territory. This is where PeerIslands comes in. With PeerIslands, you don’t have to go it alone. The following blog introduces PeerIslands’ modernization capabilities, and how you can leverage them to migrate seamlessly to MongoDB Atlas on AWS. Why PeerIslands? PeerIslands is an enterprise-class digital transformation company composed of a team of polyglots who are comfortable across multiple technologies and cloud platforms. As a services firm, PeerIslands is focused on helping customers with both cloud-native development, and applications transformation. With best-in-the-industry talent, the team has helped several Fortune 50 companies bring large-scale transformations to life, and has received recognition from several clients and partners, including MongoDB. With engineers trained and certified in MongoDB, PeerIslands has helped MongoDB’s ISV and retail customers modernize, moving software built for on-prem to SaaS environments more conducive to cloud environments, and was named MongoDB’s Boutique System Integrator Partner of the Year . PeerIslands can swiftly transform and migrate core, legacy, and on-premises applications to the cloud. They develop solutions based on cutting-edge microservices and serverless architecture across public cloud platforms and hybrid PaaS platforms to help users quickly get applications to customers and business users. How PeerIslands can help PeerIslands has been working with MongoDB and AWS to develop tools that address two key objectives for customers: Objective 1: Tools that address common customer questions when evaluating MongoDB MongoDB Test Data Generator: A fully UI-driven tool with an extensive data library for rapidly loading MongoDB with use-case specific, near real-world data at scale MongoDB Performance Testing tool: A performance analyzer where you can create multiple load profiles, run-use case specific MongoDB queries and understand the performance of the queries. With the test data generator and the performance testing tool, customers can get a clear view of the performance of MongoDB for their specific situation even before migrating to MongoDB MongoDB Schema Generator and Data Modeler: SchemaGen tool helps to rapidly generate draft JSON schema from your existing SQL schema. On top of this, you can then perform the data modeling exercise and generate schema to form your MongoDB schema. The schema generator also provides key information about the SQL DB like size, index, and more MongoDB Sizer: MongoDB sizing tool helps you understand the size implications of your schema and calculate Atlas sizing. With the MongoDB sizer, customers can upload their own schema and calculate the various factors that influence the Atlas sizing Codescanner: A tool for scanning your code repositories for deprecated MongoDB APIs. With the code scanner, customers can get a clear view of the application impact for upgrading MongoDB versions Objective 2: Tools that accelerate time to value by rapidly moving workloads to MongoDB COSMOS2Atlas migration: A point-and-click solution that helps COSMOS customers migrate data from COSMOS to MongoDB. This solution provides change capture capability to ease downtime requirements and makes data migration easy and seamless 1Data: A tool for addressing more complex requirements of migrating data from SQL to MongoDB Admin mobile app: A mobile app for admins to track key Atlas KPIs and approve common access requests on the go PeerIslands brings to the table an entire suite of tools for addressing all your MongoDB needs. PeerIslands use-case featuring 1Data tool One of the key requirements of modernization projects is to solve large-scale data migrations from SQL databases. There are a number of tools that are available which simply replicate data from SQL to MongoDB—but, we rarely use the same SQL schema in MongoDB. Schema transformation—however difficult to do at scale—is nonetheless required so that we can make the best use of MongoDB capabilities. Today, the typical approach is to run custom Spark jobs as they are scalable and flexible when it comes to processing schema transformations and loading the data into MongoDB. But when you go beyond migrating one or two tables in a Proof of concept (PoC) setting, the problem becomes much more complex. For instance, writing custom Spark programs for every schema transformation is cumbersome and error-prone. For even simple migrations we will have tens of Spark programs. Any defects that occur during transformation are going to cause significant issues. Also consider the following challenges: How do you extract data out of your SQL database without impacting database performance? How do you handle infrastructure provisioning and scaling? How do you orchestrate the migration? Few master tables can be migrated once but transaction tables may need both one-time migration and a daily incremental migration. How can you do this orchestration at scale? How do you know whether you have not lost data during migration? Last but not the least, once a data is migrated how do you keep it up to date? We will probably end up with a suite of tools to address these issues–SQOOP, Kafka, Spark, some kind of a job orchestration engine, an observability suite, notification workflow and so on. It will quickly become evident that migrating data from SQL to MongoDB without disrupting business could be the most daunting barrier to adopting MongoDB. Unfortunately, current tools invariably fail for complex heterogeneous migration scenarios and developers end up writing a lot of custom code. Realizing this issue, PeerIslands has been working with MongoDB and AWS to develop 1Data. 1Data is a platform that helps enterprises perform migration and real time synchronization of data between SQL databases and MongoDB. 1Data is designed to complement existing AWS services like DMS in migrating data out of SQL. Key features of 1Data: Data is fully GUI based — There is no coding required 1Data provides a single platform for both one-time migration and continuous updates 1Data is consistent across one-time migration and continuous updates. This provides a good anti-corruption layer for continuous updates The tech stack of 1Data is based on Spark, Kafka among others and is highly scalable 1Data is highly modular and has a well defined API layer. 1Data can be easily extended to your needs 1Data automatically handles all the infrastructure required for migration with AWS quick start templates High Level Solution Architecture 1Data capabilities are realized through a decoupled and highly scalable architecture. The data extract, transformation and load part are independent of each other and can easily be customized based on the specific requirements of the customer. The architecture can orchestrate between batch-based initial loads and streaming-based CDC loads. A Spark, Kafka, and Airflow-based tech stack provides excellent scalability for the 1Data platform to handle large data migrations. Figure 1: 1Data High Level solution architecture OneData Portal structures migrations using Endpoints, Tasks and DAGs (Directed Acyclic Graphs) Endpoints define source, intermediate and final data locations and can come in the form of files, databases or queues. Endpoints can also be database extracts in S3 from AWS DMS service. Task definition is the second step in the migration. Tasks act on source point and produce data in either staging or destination end point. There are a number of predefined tasks available:Extract, Transformation, Sink and Validation tasks. You can configure both streaming and batch tasks. Defining the DAGs is the final step before actual migration. DAGs are used to define the sequence in which a user wants to execute the defined tasks. The technology components used in 1Data allows for easily handling very large data migrations. Each of the components has been selected such that they can be deployed across multiple cloud platforms and can be scaled easily. Technology Stack details below: Web Portal: Angular WebAPI: Node Configuration Database: MongoDB Data Transformation & Validation: Spark Data Extraction: Sqoop, Spark, DMS Change Data Capture: Kafka, Debezium Data Sink: Spark Job/Task Orchestrator: Airflow PeerIslands has worked with AWS and MongoDB to create a Quick Start for 1Data. With Quick Start, customers can rapidly instantiate 1Data for their migration requirements. To recap, with 1Data Quick start on AWS, we can Perform heterogeneous schema transformation from SQL and load data into MongoDB Atlas on AWS Weave together continuous data updates, incremental data updates and one-time migration using a combination of batch and streaming jobs Orchestrate the migrations tasks Validate the migration ...And all without writing a single line of code! Demo Looking forward A modern, data architecture can help you unlock your business’ full potential, and gain real-time access to the insights you need, when you need them. MongoDB’s document-based database and flexible schema design help you make smarter decisions, cut costs, and take full advantage of AI/ML capabilities to empower your employees and raise customer satisfaction. The decision to migrate off your legacy systems and onto MongoDB is easy—and now the process is, too. Let PeerIslands help you get there. Our best-in-class teams leverage next-generation technologies, including Artificial Intelligence (AI), Augmented Reality (AR), Blockchain, Internet of Things (IoT), Machine Learning (ML), Mobile, and Virtual Reality (VR). Our expertise spans the modern programming stack, and we follow best practices in distributed, agile, and lean principles as well as test-driven development and DevOp. Additional Resources ISV WMP Program Contact firstname.lastname@example.org for details Atlas Quick Start MongoDB Atlas Starter Package Atlas Migration Guide Atlas Migration Pattern Contact us with any questions around modernization with MongoDB, AWS, and PeerIslands.
Build a Single View of Your Customers with MongoDB Atlas and Cogniflare's Customer 360
The key to successful, long-lasting commerce is knowing your customers. If you truly know your customers, then you understand their needs and wants and can identify the right product to deliver to them—at the right time and in the right way. However, for most B2C enterprises, building a single view of the customer poses a major hurdle due to copious amounts of fragmented data. Businesses gather data from their customers in multiple locations, such as ecommerce platforms, CRM, ERP, loyalty programs, payment portals, web apps, mobile apps and more. Each data set can be structured, semi-structured or unstructured, delivered as stream or require batch processing, which makes compiling already fragmented customer data even more complex. This has led some organizations to bespoke solutions, which still only provide a partial view of the customer. Siloed data sets make running operations like customer service, targeted marketing and advanced analytics—such as churn prediction and recommendations—highly challenging. Only with a 360 degree view of the customer can an organization deeply understand their needs, wants and requirements, as well as how to satisfy them. A single view of that 360 data is therefore vital for a lasting relationship. In this blog, we’ll walk through how to build a single view of the customer using MongoDB’s database and Cogniflare’s Calledio Customer 360 tool. We’ll also explore a real-world use case focused on sentiment analysis. Building a single view with Calleido's Customer 360 With a Customer 360 database, organizations can access and analyze various individual interactions and touchpoints to build a holistic view of the customer. This is achieved by acquiring data from a number of disparate sources. However, routing and transforming this data is a complex and time-consuming process. Many of the existing Big Data tools often aren’t compatible with cloud environments. These challenges inspired Cogniflare to create Calleido . Figure 1: Calleido Customer 360 Use Case Architecture Calleido is a data processing platform built on top of battle-tested open source tools such as Apache NiFi. Calleido comes with over 300 processors to move structured and unstructured data from and to anywhere. It facilitates batch and real-time updates, and handles simple data transformations. Critically, Calleido seamlessly integrates with Google Cloud and offers one-click deployment. It uses Google Kubernetes Engine to scale up and down based on the demand, and provides an intuitive and slick low-code development environment. Figure 2: Calleido Data Pipeline to Copy Customers From PostgreSQL to MongoDB A real-world use case: Sentiment analysis of customer emails To demonstrate the power of Cogniflare’s Calleido , MongoDB Atlas , and the Customer 360 view, consider the use case of conducting a sentiment analysis on customer emails. To streamline the build of a Customer 360 database, the team at Cogniflare created flow templates for implementing data pipelines in seconds. In the upcoming sections, we’ll walk through some of the most common data movement patterns for this Customer 360 use case and showcase a sample dashboard. Figure 3: Sample Customer Dashboard The flow commences with a processor pulling IMAP messages from an email server (ConsumeIMAP). Each new email that arrives into the chosen inbox (e.g. customer service), triggers an event. Next, the process extracts email headers to determine topline details about the email content (ExtractEmailHeaders). Using the sender's email, Calleido identifies the customer (UpdateAttribute) and extracts the full email body by executing a script (ExecuteScript). Now, with all the data collected, a message payload is prepared and published through Google Cloud Platform (GCP) Pub/Sub (Kafka can also be used) for consumption by downstream flows and other services. Figure 4: Translating Emails to Cloud PubSub Messages The GCP Pub/Sub messages from the previous flow are then consumed (ConsumeGCPPubSub). This is where the power of MongoDB Atlas integration comes in as we verify each sender in the MongoDB database (GetMongo). If a customer exists in our system, we pass the email data to the next flow. Other emails are ignored. Figure 5: Validating Customer Email with MongoDB and Calleido Analysis of the email body copy is then conducted. For this flow, we use a processor to prepare a request body, which is then sent to Google Cloud Natural Language AI to assess the tone and sentiment of the message. The results from the Language Processing API then go straight to MongoDB Atlas so they can be pulled through into the dashboard. Figure 6: Making Cloud AutoML Call with Calleido End result in the dashboard: The Customer 360 database can be used in internal back-office systems to supplement and inform customer support. With a single view, it’s quicker and more effective to troubleshoot issues, handle returns and resolve complaints. Leveraging information from previous client conversations ensures each customer is given the most appropriate and effective response. These data sets can then be fed into analytics systems to generate learnings and optimizations, such as associating negative sentiment with churn rate. How MongoDB's document database helps In the example above, Calleido takes care of copying and routing data from the business source system into MongoDB Atlas, the operational data store (ODS). Thanks to MongoDB’s flexible data structure, we can transfer data in its original format, and subsequently implement necessary schema transformations in an iterative manner. There is no need to run complex schema migrations. This allows for the quick delivery of a single view database. Figures 7 & 8: Calleido Data Pipelines to Copy Products and Orders From PostgreSQL to MongoDB Atlas Calleido allows us to make this transition in just a few simple steps. The tool runs a custom SQL query (ExecuteSQL) that will join all the required data from outer tables and compile the results in order to parallelize the processing. The data arrives in Avro format, then Calleido converts it into JSON (ConvertAvroToJSON) and transforms it to the schema designed for MongoDB (JoltTransformJSON). End result in the Customer 360 dashboard: MongoDB Atlas is the market-leading choice for the Customer 360 database. Here are the core reasons for its world-class standard: MongoDB can efficiently handle non-standardized schema coming from legacy systems and efficiently store any custom attributes. Data models can include all the related data as nested documents. Unlike SQL databases, MongoDB avoids complicated join queries, which are difficult to write and not performant. MongoDB is rapid. The current view of a customer can be served in milliseconds without the need to introduce a caching layer. The MongoDB flexible schema model enables agility with an iterative approach. In the initial extraction, the data can be copied nearly exactly as its original shape. This drastically reduces latency. In subsequent phases, the schema can be standardized and the quality of the data can be improved without complex SQL migrations. MongoDB can store dozens of terabytes of data across multiple data centers and easily scale horizontally. Data can be shared across multiple regions to help navigate compliance requirements. Separate analytics nodes can be set up to avoid impacting performance of production systems. MongoDB has a proven record of acting as a single view database, with legacy and large organizations up and running with prototypes in two weeks and into production within a business quarter. MongoDB Atlas can autoscale out of the box, reducing costs and handling traffic peaks. The data can be encrypted both in transit and at rest, helping to accomplish compliance with security and privacy standards, including GDPR, HIPAA, PCI-DSS, and FERPA. Upselling the customer: Product recommendations Upselling customers is a key part of modern business, but the secret to doing it successfully is that it’s less about selling and more about educating. It’s about using data to identify where the customer is in the customer journey, what they may need, and which product or service can meet that need. Using a customer's purchase history, Calleido can help prepare product recommendations by routing data to the appropriate tools such as BigQuery ML. These recommendations can then be promoted through the call center and marketing teams for both online or mobile app recommendations. There are two flows to achieve this: preparing training data and generating recommendations: Preparing training data First, appropriate data from PostgreSQL to BigQuery is transferred using the ExecuteSQL processor. The data pipeline is scheduled to execute periodically. In the next step, appropriate data is fetched from PostgreSQL, dividing it to 1K row chunks with the ExecuteSQLRecord processor. These files are then passed to the next processor which uses load balancing enabled to utilize all available nodes. All that data then gets inserted to a BigQuery table using the PutBigQueryStreaming processor. Figure 9: Copying Data from PostgreSQL to BigQuery with Calleido Generating product recommendations Next, we move on to generating product recommendations. First, you must purchase Big Query capacity slots, which offer the most affordable way to take advantage of BigQuery ML features. Here, Calleido invokes an SQL procedure with the ExecuteSQL processor, then ensures that the requested BigQuery capacity is ready to use. The next processor (ExecuteSQL) executes an SQL query responsible for creating and training the Matrix Factorization ML model using the data copied by the first flow. Next in the queue, Calleido uses the ExecuteSQL processor to query our trained model to acquire all the predictions and store them in a dedicated BigQuery table. Finally, the Wait processor waits for both capacity slots to be removed, as they are no longer required. Figure 10 & 11: Generating Product Recommendations with Calleido Then, we remove old recommendations through the power of two processors. First, the ReplaceText processor updates the content of incoming flow files, setting the query body. This is then used by the DeleteMongo processor to perform the removal action. Figure 12: Remove Old Recommendations The whole flow ends with copying Recommendations to MongoDB. The ExecuteSQL processor fetches and aggregates the top 10 recommendations per user, all in chunks of 1k rows. Then, the following two processors (ConvertAvroToJSON and ExecuteScript) prepare data to be inserted into the MongoDB collection, by the PutMongoRecord processor. Figure 13: Copy Recommendations to MongoDB End result in the Customer 360 dashboard (the data used here in this example is autogenerated): Benefits of Calleido's 360 customer database on MongoDB Atlas Once the data is available in a centralized operational data store like MongoDB, Calleido can be used to sync it with an analytics data store such as Google BigQuery. Thanks to the Customer 360 database, internal stakeholders can then use the data to: Improve customer satisfaction through segmentation and targeted marketing Accurately and easily access compliance audits Build demand planning forecasts and analyses of market trends Reward customer loyalty and reduce churn Ultimately, a single view of the customer enables organizations to deliver the right message to prospective buyers, funneling those at the brand awareness stage into the conversion stage and ensures retention and post sales mechanics are working effectively. Historically, a 360 view of the customer was a complex and fragmented process, but with Cogniflare’s Calleido and MongoDB Atlas, a Customer 360 database has become the most powerful and cost efficient data management stack that an organization can harness.
Simplifying Data Migrations From Legacy SQL to MongoDB Atlas with Studio 3T and Hackolade
Migrating data from SQL relational databases to MongoDB Atlas may initially seem to be a straightforward process. Export the data from the relational database, import the tables into MongoDB Atlas, and then start writing queries for it. But the deeper you look, the more it can start to feel like an overwhelming task. Decades of irrelevant indexes, rare relationships and forgotten fields that need to be migrated all make for a more complicated process. Not to mention, converting your old schemas to work well with the document-based world of MongoDB can take even longer. Making this process easier and more achievable was one of Studio 3T’s main goals when they created their SQL to MongoDB migration tools. In fact, Studio 3T's SQL migration doesn't just make this process easier; it also makes it reliably repeatable. This means you can tune your migration to produce the perfect documents in an ideal schema. There's no big bang cut over; you can proceed at your own pace. Delivering the ability to perfect your new schema is also why we've integrated with Hackolade's schema design and data modeling tools. In this article, we look at how the data modernization process works with a focus on the SQL migration powers of Studio 3T and the data modelling technology of Hackolade. So the problem was, how do I migrate the data that’s been collected over the last 10 years from MySQL over to MongoDB? And that’s when I found Studio 3T. I could not imagine trying to do it myself by hand... If it hadn’t been for Studio 3T, I probably would have just left it all in the SQL database. Rand Nix, IT Director, Wakefield Inspection Services Why MongoDB Atlas Building on MongoDB’s intuitive data model and query API, MongoDB Atlas gives engineering organizations the versatility they need to build sophisticated applications that can adapt to changing customer demands and market trends. Not only is it the only multi-cloud document database available, it also delivers the most advanced security and data distribution capabilities of any fully managed service. Developers can get started in minutes and leverage intelligent automation to maintain performance at scale as applications evolve over time. Mapping the move The crucial component to a SQL migration is mapping the SQL tables and columns to JSON documents and their name/value fields. While the rigid grid of relational tables has a familiar feeling, a document in MongoDB can be any shape and, ideally, it should be the shape that makes sense for your business logic. Doing this by hand can prove extremely time-consuming for developers. Which situations call for reshaping your data into documents and which should you construct new documents for? These are important questions which require time and expertise to answer. Or, you can simply use Studio 3T's SQL Migration tool which is a powerful, configurable import process. It allows you to create your MongoDB documents with data retrieved from multiple relational tables. Using foreign keys, it can capture the relationships between records in document-centric form. "When we started out, we built it on SQL Server. All was good to start, but by the time we got up to fifty customers and more, each with some thousands of their staff logging into the system concurrently, we ran into scaling issues. "Now we’re using a hosted, paid-for subscription on MongoDB Atlas. They do all the management and the sharding and all that stuff. And when we switch to regional provisioning, they can handle all that too, which is a huge relief. "We all use Studio 3T constantly, it’s used as extensively as Visual Studio. We use it for all sorts of things because Studio 3T makes it really easy to navigate around the data." Dan Cummings, Development Mgr. Terryberry Inc. For example, a relational database may have a table of customers and a table of orders by each customer. With the SQL Migration tool, you can automatically create a collection of customer documents, containing an array of all the orders that customer has placed. This exploits the power of the document model by keeping related data local. You can also preview the resulting JSON documents before running a full import so you are sure you're getting what you want. Figure 1. Studio 3T’s SQL to MongoDB Migration In Figure 1 (see above), we can see Studio 3T's SQL Migration pulling in multiple SQL tables, embedding rental records into the customer records, and previewing the JSON output. For more radical restructuring, SQL queries can also provide the source of the data for import. This allows for restructuring of the data by, for example, region or category. Hackolade integrated As you can see, Studio 3T is fully capable of some complex migrations through its tooling, SQL Migration, Import and Reschema. However, if you are of the mindset that you should design your migration first, then this is where Hackolade comes in. Hackolade is a "polyglot data" modelling application designed for a world where models are implemented in everything from graphs to APIs to storage formats. Figure 2. Entity Relationships Mapped in Hackolade Using Hackolade, you can import the schema with relationships from an SQL database and then visually reassemble the SQL fields to compose your MongoDB document schema. Watch this demo video for more details. Once you've built your new models, you can use Studio 3T to put them into practice by importing a Hackolade schema mapping into a SQL Migration configuration. Figure 3. Importing Hackolade Models into Studio 3T This helps eliminate the need to repeatedly query and import from the SQL database as you fine-tune your migration. Instead, you can construct and document the migration within Hackolade, review with your team, and then implement with confidence. Once the data is in MongoDB Atlas, support from Studio 3T continues. Studio 3T's Reschema tools allow you to restructure your schema without re-importing, working entirely on the MongoDB Atlas data. This can be particularly useful when blending data from existing document databases with freshly imported, formerly relational data. The idea that migrations have to be team-breaking chores no longer holds. It is eminently possible, with tools such as Studio 3T's SQL Migration and Hackolade, to not only perform more complex migrations easily, but to be able to design them up front and put them into practice as often as is needed. With both tools working in integrated harmony, the future of migration has arrived.
How DataSwitch And MongoDB Atlas Can Help Modernize Your Legacy Workloads
Data modernization is here to stay, and DataSwitch and MongoDB are leading the way forward. Research strongly indicates that the future of the Database Management System (DBMS) market is in the cloud, and the ideal way to shift from an outdated, legacy DBMS to a modern, cloud-friendly data warehouse is through data modernization. There are a few key factors driving this shift. Increasingly, companies need to store and manage unstructured data in a cloud-enabled system, as opposed to a legacy DBMS which is only designed for structured data. Moreover, the amount of data generated by a business is increasing at a rate of 55% to 65% every year and the majority of it is unstructured. A modernized database that can improve data quality and availability provides tremendous benefits in performance, scalability, and cost optimization. It also provides a foundation for improving business value through informed decision-making. Additionally, cloud-enabled databases support greater agility so you can upgrade current applications and build new ones faster to meet customer demand. Gartner predicts that by 2022, 75% of all databases will be on the cloud – either by direct deployment or through data migration and modernization. But research shows that over 40% of migration projects fail. This is due to challenges such as: Inadequate knowledge of legacy applications and their data design Complexity of code and design from different legacy applications Lack of automation tools for transforming from legacy data processing to cloud-friendly data and processes It is essential to harness a strategic approach and choose the right partner for your data modernization journey. We’re here to help you do just that. Why MongoDB? MongoDB is built for modern application developers and for the cloud era. As a general purpose, document-based, distributed database, it facilitates high productivity and can handle huge volumes of data. The document database stores data in JSON-like documents and is built on a scale-out architecture that is optimal for any kind of developer who builds scalable applications through agile methodologies. Ultimately, MongoDB fosters business agility, scalability and innovation. Key MongoDB advantages include: Rich JSON Documents Powerful query language Multi-cloud data distribution Security of sensitive data Quick storage and retrieval of data Capacity for huge volumes of data and traffic Design supports greater developer productivity Extremely reliable for mission-critical workloads Architected for optimal performance and efficiency Key advantages of MongoDB Atlas , MongoDB’s hosted database as a service, include: Multi-cloud data distribution Secure for sensitive data Designed for developer productivity Reliable for mission critical workloads Built for optimal performance Managed for operational efficiency To be clear, JSON documents are the most productive way to work with data as they support nested objects and arrays as values. They also support schemas that are flexible and dynamic. MongoDB’s powerful query language enables sorting and filtering of any field, regardless of how nested it is in a document. Moreover, it provides support for aggregations as well as modern use cases including graph search, geo-based search and text search. Queries are in JSON and are easy to compose. MongoDB provides support for joins in queries. MongoDB supports two types of relationships with the ability to reference and embed. It has all the power of a relational database and much, much more. Companies of all sizes can use MongoDB as it successfully operates on a large and mature platform ecosystem. Developers enjoy a great user experience with the ability to provision MongoDB Atlas clusters and commence coding instantly. A global community of developers and consultants makes it easy to get the help you need, if and when you need it. In addition, MongoDB supports all major languages and provides enterprise-grade support. Why DataSwitch as a partner for MongoDB? Automated schema re-design, data migration & code conversion DataSwitch is a trusted partner for cost-effective, accelerated solutions for digital data transformation, migration and modernization through a modern database platform. Our no-code and low-code solutions along with cloud data expertise and unique, automated schema generation accelerates time to market. We provide end-to-end data, schema and process migration with automated replatforming and refactoring, thereby delivering: 50% faster time to market 60% reduction in total cost of delivery Assured quality with built-in best practices, guidelines and accuracy Data modernization: How “DataSwitch Migrate” helps you migrate from RDBMS to MongoDB DataSwitch Migrate (“DS Migrate”) is a no-code and low-code toolkit that leverages advanced automation to provide intuitive, predictive and self-serviceable schema redesign from a traditional RDBMS model to MongoDB’s Document Model with built-in best practices. Based on data volume, performance, and criticality, DS Migrate automatically recommends the appropriate ETTL (Extract, Transfer, Transform & Load) data migration process. DataSwitch delivers data engineering solutions and transformations in half the timeframe of the existing typical data modernization solutions. Consider these key areas: Schema redesign – construct a new framework for data management. DS Migrate provides automated data migration and transformation based on your redesigned schema, as well as no-touch code conversion from legacy data scripts to MongoDB Atlas APIs. Users can simply drag and drop the schema for redesign and the platform converts it to a document-based JSON structure by applying MongoDB modeling best practices. The platform then automatically migrates data to the new, re-designed JSON structure. It also converts the legacy database script for MongoDB. This automated, user-friendly data migration is faster than anything you’ve ever seen. Here’s a look at how the schema designer works. Refactoring – change the data structure to match the new schema. DS Migrate handles this through auto code generation for migrating the data. This is far beyond a mere lift and shift. DataSwitch takes care of refactoring and replatforming (moving from the legacy platform to MongoDB) automatically. It is a game-changing unique capability to perform all these tasks within a single platform. Security – mask and tokenize data while moving the data from on-premise to the cloud. As the data is moving to a potentially public cloud, you must keep it secure. DataSwitch’s tool has the capability to configure and apply security measures automatically while migrating the data. Data Quality – ensure that data is clean, complete, trustworthy, consistent. DataSwitch allows you to configure your own quality rules and automatically apply them during data migration. In summary: first, the DataSwitch tool automatically extracts the data from an existing database, like Oracle. It then exports the data and stores it locally before zipping and transferring it to the cloud. Next, DataSwitch transforms the data by altering the data structure to match the re-designed schema, and applying data security measures during the transform step. Lastly, DS Migrate loads the data and processes it into MongoDB in its entirety. Process Conversion Process conversion, where scripts and process logic are migrated from legacy DBMS to a modern DBMS, is made easier thanks to a high degree of automation. Minimal coding and manual intervention are required and the journey is accelerated. It involves: DML – Data Manipulation Language CRUD – typical application functionality (Create, Read, Update & Delete) Converting to the equivalent of MongoDB Atlas API Degree of automation DataSwitch provides during Migration Schema Migration Activities DS Automation Capabilities Application Data Usage Analysis 70% 3NF to NoSQL Schema Recommendation 60% Schema Re-Design Self Services 50% Predictive Data Mapping 60% Process Migration Activities DS Automation Capabilities CRUD based SQL conversion (Oracle, MySQL, SQLServer, Teradata, DB2) to MongoDB API 70% Data Migration Activities DS Automation Capabilities Migration Script Creation 90% Historical Data Migration 90% 2 Catch Load 90% DataSwitch Legacy Modernization as a Service (LMaas): Our consulting expertise combined with the DS Migrate tool allows us to harness the power of the cloud for data transformation of RDBMS legacy data systems to MongoDB. Our solution delivers legacy transformation in half the time frame through pay-per-usage. Key strengths include: ● Data Architecture Consulting ● Data Modernization Assessment and Migration Strategy ● Specialized Modernization Services DS Migrate Architecture Diagram Contact us to learn more.
Accelerate Data Modernization with Infosys Data Model Converter
Are you in the process of migrating applications from a relational database to MongoDB? If so, you’re likely trying to best understand and decide how your enterprise data needs to be modeled. Our previous blog discussed how Infosys Data Services Suite can help enterprises move data seamlessly from legacy relational databases to MongoDB. But moving data is only one part of the puzzle. The more significant step is choosing the target data model, or schema design, a process that usually requires several hours of highly skilled talent. That’s why we created this follow-up blog to help you get started. Rethinking Schema Design Ultimately, schema design can be the difference between an inefficient, disorganized database and a strategic one that empowers the entire company. Schema design in MongoDB requires a change in perspective for data architects, developers, and database administrators. They have to: Rethink the legacy relational data model. This model flattens data into rigid two-dimensional tabular structures of rows and columns. The new data model is a rich and dynamic one with embedded sub-documents and arrays Rethink how the data platform works. In relational databases, it is extremely difficult to change the data platform as the application evolves. However, in MongoDB, the apps and APIs come first and the data platform dynamically accommodates the data Getting Schema Design Right Begin the schema design process by considering the application’s requirements. You’ll want to model the data in a way that leverages the flexibility of the document model. In schema migrations, it may seem easy at first to simply mirror the flat schema of the relational database in the document model. However, this negates the advantages enabled by the rich and embedded data structures of the document model. For example, data that belongs to a parent-child relationship in two RDBMS tables can be collapsed (embedded) into a single document in MongoDB. The application data access patterns should also drive schema design with a specific focus on: The read/write ratio of database operations and whether it is more important to optimize the performance of one operation over another The types of queries and updates performed by the databases The lifecycle of the data and growth rate of documents Simplifying Schema Design with Infosys Data Model Converter Infosys has developed a solution called Infosys Data Model Convertor that processes source relational schema and the above-mentioned signals as inputs and automatically provides target MongoDB schema suggestions. Infosys Data Model Converter is available as part of Infosys Modernization Suite which accelerates enterprises’ modernization journey. Each schema suggestion is accompanied by a detailed analysis report. The data modeler can use this as a starting point and iterate over the schema to arrive at the final MongoDB schema. The Infosys Data Model Converter reduces 50-60% of the effort typically spent in schema design. Key Features Boosts productivity by augmenting the migration of RDBMS to NoSQL database Saves time by automatically extracting schema, query and data patterns from an existing RDBMS Comprehensively analyzes the RDBMS entity relations, data and read-and-write patterns Applies a rich set of rules and generates a fully compliant NoSQL target state data model Offers flexibility by externalizing the rules for organization-specific customizations Connects and deploys the model to the target NoSQL platform with sample data Discover more ways in which Infosys can help you unlock value from modernization. Contact us for any modernization questions.
Optimize Data Modeling and Schema Design with Hackolade and MongoDB
Development teams are constantly searching for new ways to quickly enhance applications and satisfy the rapid progression of customer needs. The dynamic schema evolution in MongoDB enables such a reality through the power and flexibility of storing data in a JSON document format instead of in relational tables. Notably, developers love the flexibility and schema-less nature of the JSON document format. But as application complexity and scale increases in an enterprise environment, this flexibility must be skillfully organized to harness the power of the solution, maximize developer productivity, and lower total cost of ownership. For large enterprises and government agencies, the key is to leverage the benefits of modern applications running on MongoDB Atlas while also ensuring proper data management and governance. This is where a data modeling tool designed specifically for MongoDB will greatly help. Enter Hackolade. For decades, Entity-Relationship Diagrams (ERDs) have been used to visually represent the data structures of relational databases. But ERDs were originally designed for flat structures only. Hackolade , a MongoDB certified technology partner , has enhanced ERD capabilities to accommodate the representation of JSON hierarchical structures with nested objects and arrays. Hackolade is pioneering data modeling and schema design for NoSQL databases and REST APIs. Why it Matters A data model is an abstraction describing and documenting an organization’s information system. It is a collection of Entity-Relationship diagrams, descriptions, constraints, and metadata representing data structures: Hackolade data model for MongoDB A schema, on the other hand, is a “consumable” scope contract describing the layout or structure of a file, a transaction, or a database. It is an authoritative source for producers and consumers to agree on the structure being exchanged or accessed. While data models are useful for humans to understand structure, schemas are the technical artifact necessary for systems to interact. Hackolade provides both, allowing MongoDB customers to easily visualize the data model, intuitively create and enforce schema with MongoDB’s JSON Schema Validator, and iteratively change the schema as the applications evolve. Automatically-generated JSON Schema Validator Customer Benefits Increase data agility with forward-engineering An ERD provides an easy-to-understand picture of your data. As a communication tool, it helps facilitate dialog between application stakeholders like business analysts, designers, architects, developers, and DBAs. With an ERD, you can evaluate different “what if” scenarios, identify the ideal way to denormalize data, and leverage the benefits of MongoDB Atlas technology. Simply apply a query-driven design of the schema after analyzing the access patterns of the application. You can then visualize and evaluate the impacts without writing a line of code—obviously, this is a more productive approach than coding first, then realizing that much needs to be rewritten to accommodate everyone’s needs. The Hackolade software generates several artifacts such as: collection creation with validator script requiring no knowledge of JSON Schema syntax, sample JSON documents, Mongoose schemas, documentation in HTML, Markdown or PDF, plotter output of ERD pictures, document and index sizing estimates, and more. The process is easily integrated into a Jenkins CI/CD pipeline by invoking a flexible Command-Line Interface. Ensure data quality and compliance through schema reverse-engineering Deriving a data model from an existing MongoDB instance is not as easy as fetching a DDL from a relational database. Schemas must be inferred from a representative sample of documents in each collection. Hackolade has perfected its schema inference algorithms to accommodate the flexibility and polymorphism of JSON hierarchical structures. The derived models become a trusted source to feed data dictionaries and data governance suites. Reverse-engineering helps ensure data quality and compliance, with the use of an automated Command-Line Interface process. Facilitate application modernization with the denormalization of legacy data structures Hackolade can import a variety of structures from relational DDLs, logical data models in XSD format, JSON documents and schemas, and Excel templates. To leverage the benefits of MongoDB, these structures should evolve to embed information where applicable and avoid slow JOINs. This should not be done blindly, but based on a proper analysis of the application access patterns in the context of data volume estimates and relationship cardinality. Hackolade provides a handy feature to quickly evolve a relational data model towards a denormalized schema, thereby leveraging the benefits of MongoDB’s document model and facilitating modernization. The process easily hooks into the forward-engineering process described above, generating pictures, scripts, and documentation. Implement continuous evolution and data management The lifecycle of modernized applications does not stop after the initial data migration step. Applications must be successfully operated, and will continue to evolve, resulting in likely schema changes. Hackolade is designed to facilitate agile development approaches and the full lifecycle of modern software. It provides the necessary tooling to design and manage data models and schemas for successful application modernization on MongoDB Atlas. Learn how to maximize developer productivity and lower total cost of ownership using data modeling with Hackolade , and the MongoDB University data modeling advanced course . Download the joint solution brief: MongoDB and Hackolade: Visual Data Modeling for MongoDB Schemas .
Capgemini Solutions that help customers modernize applications to MongoDB
Companies across every industry vertical continue to face the challenge of how to effectively migrate and quickly access massive amounts of enterprise data—all while keeping system performance up to par throughout the obstacle-ridden process. The complexities involved with the ubiquitous, traditional Relational Database Management Systems (RDBMS) are many. RDBMS systems can often inhibit performance, falter under heavy volumes and slow down deployment. With MongoDB’s document-based, distributed database, however, performance and volume issues are easily addressed. But when it comes to speeding up time to market? The right auxiliary tools are still needed. Capgemini, a MongoDB partner and global leader in digital transformation, provides the final piece of the puzzle with a new tool rooted in automated intelligence. In this blog, we’ll explore three key Capgemini Solutions that help customers modernize to MongoDB. Tools that expedite time to market Migration from legacy system to MongoDB New development using MongoDB as a backend database Whether your company is developing a new database or migrating from legacy systems to MongoDB, Capgemini’s new Database Convert & Compare (DCC) tool can help. Below, we’ll detail how DCC works, then walk through a few recent, client examples and the immense benefits reaped. Tool: Database Convert & Compare (DCC) A powerful tool developed by the Capgemini team, DCC optimizes activities like database migration, data comparison, validation and much more. The tool can perform data transformations with specific customization based on the source and target database in the scope. When migrating from RDBMS to MongoDB, DCC achieves 70% automation and 30% manual retrofit on a database level. How does DCC work? In context of RDBMS to NoSQL migration, DCC performs the migration in 3 stages. 1) Assessment: Source database schema assessment – DCC extracts source schema information and performs an assessment to generate detailed inventory of data objects such as tables, views, stored procedures and indexes. It also generates a detailed report on data volume from each table which helps in assessing estimated data migration time from source to target Apply analytics to prepare recommendation for target database structure—The target structure varies based on various parameters, such as: Table relationships (one to many, many to many, one to one) Indexes applied on table for performance requirements Column data type 2) Schema Migration Customize tool to apply recommendation from step 1.2 hence generating the script for target database Target schema script preparation – DCC will generate a complete database schema script except for a few object types such as stored procedure, views etc. Produce detailed report of schema migration, inclusive of objects that couldn’t be migrated Manual intervention is required to apply business logic implementation of source database, stored procedures and views to target environment application 3) Data Migration Column mapping – assessment report generates inventory of source database table fields as well as post recommended schema structure; the report also provides recommended field mapping from source to target based on adopted recommendation and DCC customization Post migration data validation script – DCC generates a data validation script after data migration is complete which takes field mapping into consideration from the related assessment and recommendation reports Data migration script for execution – DCC allows for the setup and configuration of different scripts for data migration, such as: One-time data migration from source to target Daily batch run to sync up source and target database data Intermittent data validation during the process of data migrationIf there are any discrepancies found in validation, the job will stop and generate a report with potential root cause of issue in data migration) Standalone data comparison – DCC allows for seclusion of data validation between source and target database. In this case, DCC will generate source database table inventory details and extract target database collection inventory details. Minimal manual intervention is required to perform the field mapping and set the configuration in the tool for data migration execution. Other configuration features such as one time migrations or daily batch migrations can be configured as well. The Capgemini team has successfully implemented and deployed the DCC tool for various banking customers for RDBMS to NoSQL end-to-end migration including for application retrofit and rewiring using other capable tools such as CAP360 Case study 1: Migration from Mainframe to MongoDB for a Large European Investment Bank A large banking client encountered significant challenges in terms of growth and scale-up, low resilience and increased risks, and certainly increasing costs associated with the advent of mobile banking and a related significant increase in volume. To help the client evolve more quickly, Capgemini built an Operational Data Platform to offload expensive mainframe operations, as well as store and process customer transactions for business operations, analysis and reporting. The Challenge: Inefficient and slow to meet customer demand for new digital banking services due to heavy reliance on legacy infrastructure and apps Continued growth in traffic and the launch of new digital services led to increased cost of operations and decreased performance Mainframe was the single point of failure for many applications. Outages resulted in poor customer service, brand erosion, and regulatory concerns The Approach: An analysis of digital channels revealed that 92% of traffic was generated by 25 interaction types, with 85% of these being read-only. To offload these operations from the mainframe, an operational data lake (ODL) was created. MongoDB-based ODL was updated in near real-time via change data capture and messaging queue to power existing apps, new digital services and other APIs. Outcome and Benefits: Accelerated time to market for new digital services, including personalization Improved stand-in capability to support resiliency during planned and unplanned mainframe outages Reduced number of read-only transactions to mainframes (MIPS cost), freeing up resources for additional growth Saved the customer over 80% in year-on-year post migration costs. The new MongoDB database was seamlessly able to handle 25mn+ transactions per day as well as able to handle data volume of over 30 months of history with ~13b transactions held in 114m documents Case study 2: Migration of Large-scale Database from Legacy to MongoDB for US-based Insurance Customer A US-based insurance client faced disparate data spread across 100+ systems, making data aggregation a cumbersome process. The client wanted to access the many data points around a single customer without hindering performance of the entire system. The Challenge: Reconciling different data schemas from multiple systems into a single schema is problematic and, in many cases, impossible. When adding new data sources, it is difficult to iterate on the schema quickly. Providing access to the data within the ‘Single View’ requires ad hoc queries as well as multi-layer indexing and aggregation which becomes complicated for relational databases to provide. Lack of personalization and ability to provide context-based experiences in real time results in lost business opportunities. Approach: In order to assist customer service reps in real-time, we built “The Wall,” a single view application that pulls disparate data from legacy systems for analytics. Additionally, we designed a flexible data model to aggregate disparate data into a single data store. MongoDB’s expressive query language and secondary indexes can reach any field in real time, making data access faster and easier. Our approach was designed based on 4 key foundations: Document Model – Rich and flexible data store. A single document can store up to 16 MB of data. With 20+ data types meant flexibility in terms of managing data Versatility – Variety of structured and non-structured data models defined Analytics – Strong data aggregator framework to aggregate data related to single customer Workload Isolation – Parallel run for operational and analytical workload on same cluster Outcome and Benefits: Our largest insurance customer was able to attain the single view of the customer within 90 days timespan. A different insurance customer achieved 360 degree view of 13 million customers on MongoDB Enterprise Advanced. And yet another esteemed healthcare customer was able to achieve as much as a 300% reduction in processing times and increased processing throughput with 50% less hardware. Ready to accelerate your digital transformation? Capgemini and MongoDB can help you re-envision your data and advance your business processes so you can focus on innovation. Reach out today to get started. Download the Modernization Guide
Legacy Modernization with MongoDB and Confluent
In many organizations, crucial enterprise data is locked in dozens or hundreds of silos that may be, controlled by different teams, and stuck in systems that aren’t able to serve new workloads or access patterns. This is a blocker for innovation and insight ultimately hampering the business. For example, imagine building a new mobile app for your customers that enables them to view their account data in a single view. Designing the app could require months of time to simply navigate the internal processes necessary to gain access to the legacy systems and even more time to figure out how to integrate them. An Operational Data Layer, or ODL, can offer a “best of both worlds” approach, providing the benefits of modernization without the risk of a full rip and replace. Legacy systems are left intact – at least at first – meaning that existing applications can continue to work as usual without interruption. New or improved data consumers will access the ODL rather than the legacy data stores, protecting those stores from new workloads that may strain their capacity and expose single points of failure. At the same time, building an ODL offers a chance to redesign the application’s data model, allowing for new development and features that aren’t possible with the rigid tabular structure of existing relational systems. With an ODL, it’s possible to combine data from multiple legacy sources into a single repository where new applications, such as a customer single view or artificial intelligence processes, can access the entire corpus of data. Existing workloads can gradually shift to the ODL, delivering value at each step. Eventually, the ODL can be promoted to a system of record and legacy systems can be decommissioned. Read our blog covering DaaS with MongoDB and Confluent to learn more. There’s also a push today for applications and databases to be entirely cloud-based, but the reality is that current business applications are often too complex to be migrated easily or completely. Instead, many businesses are opting to move application data between on-premises and cloud deployments in an effort to leverage the full advantage of public cloud computing without having to undertake a complete, massive data lift-and-shift. Confluent can be used for both one-time and real-time data synchronization between legacy data sources and modern data platforms like MongoDB, whose fully managed global cloud database service, MongoDB Atlas , is supported across AWS, Google Cloud, and Azure. Confluent Platform can be self-managed in your own data center while Confluent Cloud can be used on the public clouds. Whether leaving your application on-premise is a personal choice or a corporate mandate, there are many good reasons to integrate with MongoDB Atlas. Bring your data closer to your users in more than 70 regions with Atlas’s global clusters Address your most intense workloads with one-click, automated sharding for scale out and zero-downtime scale up Quickly provision TBs of database storage, all on high performance SSDs with dedicated I/O bandwidth Natively query and analyze data across AWS S3 and MongoDB Atlas with MongoDB Atlas Data Lake Perform full-text search queries with MongoDB Atlas Search Build native mobile applications that seamlessly synchronize data with MongoDB Realm Create powerful visualizations and dashboards of your MongoDB data with MongoDB Charts Off-load older data to cost effective storage with MongoDB Atlas Online Archive In this video we will show one time migration and Real time continuous data synchronization from a Relational System to MongoDB Atlas using Confluent Platform and the MongoDB Connector for Apache Kafka . Also we will be talking about different ways to store and consume the data within MongoDB Atlas. Git repository for the demo is here . Learn more about the MongoDB and Confluent partnership here and download the joint Reference Architecture here . Click here to learn more about modernizing to MongoDB.
Part 1: The Modernization Journey with Exafluence and MongoDB
Welcome to the first in a series of conversations between Exafluence and MongoDB about how our partnership can use open source tools and the application of data, artificial intelligence/machine learning and neuro-linguistic programming to power your business’s digital transformation. In this installment, MongoDB Senior Partner Solutions Architect Paresh Saraf and Director for WW Partner Presales Prasad Pillalamarri sit down with Exafluence CEO Ravikiran Dharmavaram and exf Insights Co-Founder Richard Robins to discuss how to start the journey to build resilient, agile, and quick-to-market applications. From Prasad Pillalamari: I first met Richard Robins, MD & Co-Founder of exf Insights at Exafluence back in June 2016 at a MongoDB world event. Their approach towards building data-driven applications was fascinating for me. Since then Exafluence has grown by leaps and bounds in the System Integration space and MongoDB has outperformed its peers in the database market. So Paresh and I decided to interview Richard to deep-dive into their perspective on Modernization with MongoDB. Prasad & Paresh: We first met the Exafluence team in 2016. Since then, MongoDB has created the Atlas cloud data platform that now supports multi-cloud clusters and Exafluence has executed multiple projects on mainframe and legacy modernization. Could you share your perspective on the growth aspects and synergies of both companies from a modernization point of view? Richard Robins: Paresh and Prasad, I’m delighted to share our views with you. We’ve always focused on what happens after you successfully offload read traffic from mainframes and legacy RDBMS to the cloud. That’s digital transformation and legacy app modernization. Early on, Exafluence made a bet that if the development community embraces something we should, too. That’s how we locked in on MongoDB when we formed our company. Having earned our stripes in the legacy data world, we knew that getting clients to MongoDB would mean mining the often poorly documented IP contained in the legacy code. That code is often where long-retired subject matter expert (SME) knowledge resides. To capture it, we built tools to scan COBOL/DB2 and stored procedures to reverse engineer the current state. This helps us move clients to a modern cloud native application, and it's an effective way to merge, migrate, and retire the legacy data stores all of our clients contend with. Once we’d mined the IP with those tools we needed to provide forward-engineered transformation rules to reach the new MongoDB Atlas endpoint. Using a metadata driven approach, we built a rules catalog that included a full audit and REST API to keep data governance programs and catalogs up to date as an additional benefit of our modernization efforts. We’ve curated these tools as exf Insights , and we bring them to each modernization project. Essentially, we applied NLP, ML, and AI to data transformation to improve modernization analysts’ efficiency, and added a low-to-no code transformation rule builder, complete with version control and rollback capabilities. All this has resulted in our clients getting world-class, resilient capabilities at a lower cost in less time. We’re delighted to say that our modernization projects have been successful by following simple tenets — to embrace what the development community embraces and to offer as much help as possible — embodied in the accelerator tools we’ve built. That’s why we are so confident we'll continue our rapid growth. P&P: How do you think re-architecting legacy applications with MongoDB as the core data layer will add value to your business? RR: We believe that MongoDB Atlas will continue to be the developers go-to document database, and that we’ll see our business grow 200-300% over the next three years. With MongoDB Atlas and Realm we can provide clients with resilient, agile applications that scale, are easily upgraded, and are able to run on any cloud as well as the popular mobile iOS and Android devices. Digital transformation is key to remaining competitive and being agile going forward. With MongoDB Atlas, we can give our clients the same capabilities we all take for granted on our mobile apps: they’re resilient, easy to upgrade, usually real-time, scale via Kubernetes clusters, and can be rolled back quickly if necessary. Most importantly, they save our clients money and can be automatically deployed. P&P: At a high level, how will Exafluence help customers take this journey? RR: We’re unusual as a services firm in that we spend 20% of gross revenue on R&D, so our platform and approach are proven. Thus, relatively small teams for our healthcare, financial services, and industrial 4.0 clients can leverage our approach, platform, and tools to deliver advanced analytical systems that combine structured and unstructured data across multiple domains. We built our exf Insights accelerator platform using MongoDB and designed it for interoperability, too. On projects we often encounter legacy ETL and messaging tools. To show how easy it is, we recently integrated exf Insights with SAP HANA and the SAP Data Intelligence platform. Further, we can publish JSON code blocks and provide Python code for integration into ETL platforms like Informatica and Talend. Our approach is to reverse engineer by mining IP from legacy data estates and then forward engineer the target data estate, using these steps and tools: Reverse Engineer Extract stored procedures, business logic, and technical data from the legacy estate and load it into our platform. Use our AI/ML/NLP algorithms to analyse business transformation logic and metadata, with outliers identified for cleansing. Provide DB scans to assess legacy data quality to cleanse and correct outliers, and provide tools to compare DB level data reconciliations. Forward Engineer To produce a clean set of metadata and business transformation logic, and baseline with version control, we: Extract, transform, and load metadata to the target state. Score metadata via NLP and ML to recommend matches to the Analyst who accepts/rejects or overrides recommendations. Analysts can then add additional transformations which are catalogued. Deploy and load cleansed data to the target state platform so any transformations and gold copies may be built. Automate Data Governance via Rest API, Code Block generation (Python/JSON) to provide enterprise catalogs with the latest transforms. P&P: What are your keys to a successful transformation journey? RR: Over the past several years we’ve identified these elements and observations: Subject matter experts and technologists must work together to provide new solutions. There’s a shortage of skilled technologists able to write, deploy, and securely manage next generation solutions. Using accelerators and transferring skills are vital to mitigating the skills shortage. Existing IP that’s buried in legacy applications must be understood and mined in order for a modernization program to succeed. A data-driven approach that combines reverse and forward engineering speeds migration and also provides new data governance and data science catalog capabilities. The building, caring, and feeding of new, open source-enabled applications is markedly different from the way monolithic legacy applications were built. The document model enables analytics and interoperability. Cybersecurity and data consumption patterns must be articulated and be part of the process, not afterthoughts. Even with aggressive transformation plans, new technology must co-exist with legacy applications for some time; progress works best if it’s not a big bang. Success requires business and technology to learn new ways to provide, acquire, and build agile solutions. P&P: Can you talk about solutions you have which will accelerate the modernization journey for the customers? RR: exf Insights helps our clients visualize what’s possible with extensive, pre-built, modular solutions for health care, financial services, and industrial 4.0. They show the power of MongoDB Atlas and also the power of speed layers using Spark and Confluent Kafka. These solutions are readily adaptable to client requirements and reduce the risk and time required to provide secure, production-ready applications. Source data loading. Analyze and integrate raw structured and unstructured data, including support for reference and transactional data. Metadata scan. Match data using AI/NLP, scoring results and providing side-by-side comparison. Source alignment. Use ML to check underlying data and score results for analysts, and leverage that learning to accelerate future changes. Codeless transformation. Empower data SMEs to build the logic with a multiple-sources-to-target approach and transform rules which support code value lookups and complex Boolean logic. Includes versioned gold copies of any data type (e.g., reference, transaction, client, product, etc.). Deployment. Deploy for scheduled or event-driven repeatability and dynamically populate Snowflake or other repositories. Generates code blocks that are usable in your estate or REST API. We used the same 5-step workflow data scientists use when we enabled business analysts to accelerate the retirement of internal data stores to build and deploy the COVID-19 self-checking app in three weeks, including active directory integration and downloadable apps. We will be offering a Realm COVID-19 screening app on web, Android, and IOS to the entire MongoDB Atlas community in addition to our own clients. The accelerator integrates key data governance tools, including exf Insights repository management of all sources and targets with versioned lineage; as-built transformation rules for internal and client implementations; and a business glossary integrated into metadata repositories. P&P: Usually one of the key challenges for businesses is data being locked in silos. RR: We couldn’t agree more. Our data modernization projects routinely integrate with source transactional systems that were never built to work together. We provide scanning tools to understand disparate data as well as ways to ingest, align, and stitch them together. Using health care as an example, exf Insights provides a comprehensive analytical capability, able to integrate data from hospitals, claims, pharmaceutical companies, patients, and providers. Some of this is NonSQL, such as radiological images; for pharma companies we provide capabilities to support clinical research organizations (CROs) via a follow-the-molecule approach. Of course, we also have to work with and subscribe to Centers for Medicare & Medicaid Services (CMS) guidelines. Our data migration focuses on collecting the IP behind the data and making the source, logic, and any transformations rules available to our clients. In financial services, it’s critical to understand source and targets. No matter how data is accessed (federated or direct store), with Spark and Kafka we can talk to just about any data repository. P&P: Once we discover the data to be migrated, we need to model the data according to MongoDB’s data model paradigm. That requires multiple transformations before data is loaded to MongoDB. Can you explain more about how your accelerators help here? RR: By understanding data consumption and then looking at existing data structures, we seek to simplify and then apply the capabilities of MongoDB’s document model. It’s not unlike what a data architect would do in the relational world, but with MongoDB Atlas it’s easier. We ourselves use MongoDB for our exf Insights platform to align, transform, and make data ready for consumption in new applications. We’re able to provide full rules lineage and audit trail, and even support rollback. For the real-time speed layer we use Spark and Kafka as well. This data-driven modernization approach also turns data governance into an active consumer of the rules catalog, so exf Insights works well for regulated industries. P&P: It’s great that we have data migrated now. Consider a scenario where it’s a mainframe application and we have lots of COBOL code in there. It has to be moved to a new programming language like Python, with a change in the data access layer to point to MongoDB. Do you have accelerators which can facilitate the application migration? If so, how? RR: Yes, we do have accelerators that understand the COBOL syntax to create JSON and ultimately Java, which speeds modernization. We also found we had to reverse engineer stored procedures as part of our client engagements for Exadata migration. P&P: Once we migrate the data from legacy databases to MongoDB, validation is the key step. As this is a heterogeneous migration it can be challenging. How can Exafluence add value here? RR: We’ve built custom accelerators that migrate data from the RDBMS world to MongoDB, and offer data comparisons as clients go from development to testing to production, documenting all data transformations along the way. P&P: Now that we’ve talked about all your tools which can help in the modernization journey, can you tell us about how you already helped your customers to achieve this? RR: Certainly. We’ve already outlined how we’ve created solution starters for modernization, with sample solutions as accelerators. But that’s not enough; our key tenet for successful modernization projects is pairing SMEs and developers. That’s what enables our joint client and Exafluence teams to understand the business, key regulations, and technical standards. Our data-driven focus lets us understand the data regardless of industry vertical. We’ve successfully used exf Insights now in financial services, healthcare, and industry 4.0. Whether it’s understanding the nuances of financial instruments and data sources for reference and transactional data, or Medical Device IoT sensors in healthcare, or shop floor IoT and PLC data for predictive analytics and digital twin modeling, a data-driven approach reduces modernization risks and costs. Below are some of the possibilities this data-driven approach has delivered for our healthcare clients using MongoDB Atlas. By aggregating provider, membership, claims, pharma, and EHR clinical data, we offer robust reporting that: Transforms health care data from its raw form into actionable insights that improve member care quality, health outcomes, and satisfaction Provides FHIR support Surfaces trends and patterns in claims, membership, and provider data Lets users access, visualize, and analyze data from different sources Tracks provider performance and identifies operational inefficiencies P&P: Thank you, Richard! Keep an eye out for upcoming conversations in our series with Exafluence, where we'll be talking about agility in infrastructure and data as well as interoperability. MongoDB and Modernization To learn more about MongoDB's overall Modernization strategy, read here .