Paresh Saraf

20 results

Overcome the Biggest Obstacle to True Customer 360: with MongoDB Atlas and Dataworkz

The more you know about your customers, the better you can attract them and increase their lifetime value to your business. The ability of modern systems to collect, store, and leverage customer data across disparate systems has raised the bar for what it means to understand your customer. To the winner goes the spoils, and the spoils go to the companies that navigate the last mile in the journey to attain true Customer 360: they leave no customer data behind. They hunt down every relevant source of customer data in their reach and turn them into valuable insight – without breaking a sweat. How customer acquisition costs and customer lifetime value impacts business success Two calculations are critical to understanding modeling how to grow your business. Customer Acquisition Costs (CAC): the amount of money a company spends to get a new customer Customer Lifetime Value (CLV): the net profit attributed to the customer relationship for the duration of the time they use a product Note that “customer” is front and center in both terms. Capturing the data behind these two calculations correctly and continuously on a timely basis across the whole customer journey – from first touch to renewals – and making sense of it, is essential to understanding how to grow and retain your customers. These calculations must be performed on multiple customers and their interactions over a span of years. Analytical and AI models that calculate CLV and LTV need to be fed up-to-date, contextual, reliable data to identify relevant patterns. Historical customer data guides future CLV and influences decisions on how to optimize investments in sales, marketing, and customer service to positively impact the customer’s future purchase and product adoption behaviors. Similarly, customer data analysis guides decisions on how to optimize sales and marketing investments to dial-in CAC for the fastest and most profitable conversion of prospects to customers. How successful a company is at leveraging CLV, and LTV calculations depends on their ability to link different data sources and formats, and integrate all of the internal and external systems required for modeling. Until now, companies have struggled to accomplish this because data joining technologies have fallen short for one or more of these reasons: cost, complexity, and scalability. Forrester1 declared Activation Of Unstructured “Dark” Data a 2022 mega trend. According to Forrester” "The question is no longer about collecting enough data — it’s about ensuring that the data is usable and liberated from company silos to create new streams of value. In 2022, being “data driven” must become more than just a slogan — the next wave of business transformation will hinge on activating “dark” data (data that organizations collect but do not effectively use) to drive differentiated experiences for both customers and employees." Creating that holistic 360° view of your customer is a business imperative for every organization–but creating a unified view or golden record of your customers is hard and expensive, especially if you want to capture and make sense of the structured and coherent unstructured data and include unstructured “dark” data. Why creating a customer 360 is a technical challenge Customer 360 data resides in your ecommerce platforms, Customer Relationship Management (CRM), Sales Force Automation (SFA), ERP, Customer Service systems, Partner Relationship Management programs, loyalty programs, payment portals, web apps, mobile apps and more–across business units and functional departments. Cross-object reporting in a single application (like Salesforce) can be a challenge but cross application reporting (SFDC + Zoom + Zendesk + Parquet files in a data lake) present an even bigger set of challenges with no common identifiers in the data collected from various applications. The data not only comes from a variety of sources but also in different formats significant time and data engineering effort is required to combine data to create the elusive golden record. To compound the challenge, different teams require different views of the same data which demands different flavors of customer 360: The power to harness customer data where it belongs: Business users The traditional approach to creating a customer 360 takes the view that data from disparate sources like Prospecting (outreach.io), CRM(Salesforce), Marketing Automation (Marketo), Web Meetings (Zoom) and Support (Zendesk) needs to be combined, converted into a star or snowflake schema and deposited in an enterprise data warehouse. Or buy a Customer Data Platform (CDP) that makes yet another copy of your data that is a black box with no visibility into how data is processed. Keep in mind, this approach requires months of back and forth with the IT organization, resulting in cost and time overruns that lead to costly, stale customer 360 data that is no longer relevant. Dataworkz gives anyone from marketing ops to sales leaders the ability to assemble a complete, reliable customer 360 in minutes or hours. Unlike a legacy CDP, Dataworkz eliminates data security concerns as a barrier because it does the job without storing customer data. Users configure their sources and destinations, then Dataworkz acts as the “personal data engineer” bringing together data, processing, and ML with a visual no-code interface for business users – with the option to use SQL. A visual, easy-to-use with a no-code approach, to seamlessly gain a 360 view of your entire customer journey is the most innovative and efficient way to gain the contextual and quick customer insights necessary to grow your business. No more kinks in your Customer 360 data: Dataworkz and MongoDB Atlas Getting the best of all breeds is the mantra of smart, future-proof solutions. Capitalizing on Dataworkz and MongoDB Atlas together is a “no brainer.” Connecting Dataworkz and Atlas Add the Dataworkz cluster IP to the list of white listed IP addresses in the “network access” section of the Atlas console. Navigate to the “Databases'' section in the Dataworkz configuration section and specify the Atlas cluster IP address, username and password. Here are three ways this partnership can help: Get out of your RDBMS: Combining data from different sources, in different formats, with changing schema, is exceptionally hard, if not impossible with a rigid relational model. But MongoDB Atlas uses a JSON data model with a flexible schema which makes it ideal for building a customer 360. Work with a flexible, adaptive construct: Agility is a key tenet for organizations constantly adapting to their customer’s ever changing needs. With dynamic schemas, you are no longer worried about changes in source data breaking downstream applications. In minutes, not months: Dataworkz and MongoDB Atlas are SaaS services with usage based pricing. You get started without an upfront investment or hiring a team of data engineers and scientists and get to value in a matter of minutes and not weeks or months. Customer case study - Gunderson Dettmer Gunderson Dettmer is the preeminent international law firm with an exclusive focus on the innovation economy. In order to serve their clients, they need to leverage the latest and greatest information about corporate innovators anywhere in the world, from a wide variety of disparate data sources. To accomplish this, they had to transform and integrate multiple data sources into a coherent, evolving data warehouse and attempt to match financial, legal, and personnel information across “un-mappable” data sets without any common identifiers or keys. The process had to be scalable, reliable, flexible, and continuous since new information and new data sources are added daily. Gunderson needed to compare attributes for venture-backed companies from multiple internal and external datasets, but there were no globally unique identifiers available to join the datasets in most cases, and there were no off-the-shelf, end-to-end solutions to solve the problem. Working with Dataworkz, we developed an automated, scalable fuzzy matching pipeline that feeds likely matches to our internal infrastructure for manual review and approval, ultimately flowing into our MongoDB data warehouse. Maria Concordia, Data Operations and Analytics Architect The objective Gunderson's data warehouse and analytics team were tasked with creating a brand new process for unlocking data insights by weaving together previously un-mappable data sets and serving them up to the organization to create an evolving, holistic view of the Venture Capital universe. The resulting data warehouse would be leveraged by downstream users, including data analysts, upper management, marketing, and other firm stakeholders. The challenge with no single source of truth Gunderson Dettmer needed to compare attributes for companies and venture capital investors across multiple internal and external datasets, but there were no globally unique identifiers available to join the datasets in most cases, and there were no off-the-shelf, end-to-end solutions to solve the problem. Data sources included Gunderson’s on-premise proprietary data held in a SQL-based relational database, CSV data dumps from Pitchbook, Crunchbase, and various PDF files, and other potential sources, as available. Although the data sets overlapped, there were no common keys or identifiers across them. How they used power of MongoDB Atlas + Dataworkz Before the data warehouse team at Gunderson got involved, the de facto approach was to create interim data sets for the entity matching process by extracting data from the relational database, performing multiple Excel spreadsheet lookups to enrich it, and then manually matching the exported data to the external data sources. This process typically took 3-4 weeks in order to create a unified data set that could be used for analytics. The process was static and needed to be repeated (complete with manual record matching) when any of the data sources were updated. Not thrilled with this process, the data warehouse team created python scripts using Request and Fuzzy-Wuzzy modules to call a REST API endpoint, programmatically joining the datasets and then performing fuzzy logic for matching. Unfortunately, the REST API and Fuzzy matched datasets created were also one-time-only events and could not be leveraged for automating future analysis or ongoing data integration. This is where MongoDB Atlas was introduced to change the one-time only dataset into a JSON collection that could be reused. It required transforming the resulting datasets into rich, nested JSON objects that could be searched and queried using MongoDB Atlas. While this reduced the data set preparation time to about one week, there was still a large human review process that required Subject Matter Experts (SMEs) for difficult matches that fell out of the simplistic, fuzzy match parameters. For the final step in this project, they replaced the python scripts for transforming raw data into the JSON data model with Dataworkz to ingest data into MongoDB Atlas. The Dataworkz platform provided crucial features that vastly simplified the process of weaving the disparate “unmappable” data sets together into a coherent whole. Using Dataworkz allowed Gunderson’s team to eliminate their python scripts and enabled: Discovery, introspection, and manipulation of Parquet, BSON, JSON, CSV, structured, and unstructured data sources by bringing them all onto the same playing field for transformation, A clearer understanding of the possible challenges to creating easy-to-match collections, Automatically perform JSON format transformations without code, The ability to describe and execute no-code multi-stage pattern matching Automate their ingestion, transformation, and data storage pipeline in a day (rather than the months that it took to write the original python scripts), Provide continuous automatic data integration within minutes, with improved accuracy, lower human intervention, and built-in data monitoring and governance to ensure against data stream contamination in the future. Don't struggle any more for the metrics you need Whether you are a Chief Revenue Officer, CMO, head of a regional team, leading customer service, or a data analyst/scientist; you no longer need to spend extended time and dollars to gain the insights that should be available at your fingertips. Timely, reliable CAC and CLV metrics that are correct no matter what changes occur in your processes, definitions, calculations, or systems can be yours – when you need them – across departments, product lines, and business units. Dataworkz marks the entry of a new class of data platform that delivers the insights you need – accessible across any data, from any source, combined for any use, when you need it.

April 11, 2023

How to Seamlessly Use MongoDB Atlas and Databricks Lakehouse Together

In a previous post , we talked briefly about using MongoDB and Databricks together. In this post, we'll cover the different ways to integrate these systems, and why. Modern business demands expedited decision-making, highly-personalized customer experiences, and increased productivity. Analytical solutions need to evolve constantly to meet this demand of these changing needs, but legacy systems struggle to consolidate the data necessary to service these business needs. They silo data across multiple databases and data warehouses. They also slow turnaround speeds due to high maintenance and scaling issues. This performance hit becomes a significant bottleneck as the data grows into terabytes and petabytes. To overcome the above challenges, enterprises need a solution that can easily handle high transaction volume, paired with a scalable data warehouse (increasingly known as a "lakehouse") that performs both traditional Business Intelligence (BI) and advanced analytics like serving Machine Learning (ML) models. In our previous blog post “ Start your journey-operationalize AI enhanced real-time applications: mongodb-databricks ” we discussed how MongoDB Atlas and the Databricks Lakehouse Platform can complement each other in this context. In this blog post, we will deep dive on the various ways to integrate MongoDB Atlas and Databricks for a complete solution to manage and analyze data to meet the needs of modern business. Integration architecture Databricks Delta Lake is a reliable and secure storage layer for storing structured and unstructured data that enables efficient batch and streaming operations in the Databricks Lakehouse. It is the foundation of a scalable lakehouse solution for complex analysis. Data from MongoDB Atlas can be moved to Delta Lake in batch/real-time and can be aggregated with historical data and other data sources to perform long-running analytics and complex machine learning pipelines. This yields valuable insights. These Insights can be moved back to MongoDB Atlas so they can reach the right audience at the right time to be actioned. The data from MongoDB Atlas can be moved to Delta Lake in the following ways: One-time data load Real-time data synchronization One-time data load 1. Using Spark Connector The MongoDB Connector for Apache Spark allows you to use MongoDB as a data source for Apache Spark. You can use the connector to read data from MongoDB and write it to Databricks using the Spark API. To make it even easier, MongoDB and Databricks recently announced Databricks Notebooks integration , which gives you an even easier and more intuitive interface to write complex transformation jobs. Login to Databricks cluster, Click on New > Data . Click on MongoDB which is available under Native Integrations tab. This loads the pyspark notebook which provides a top-level introduction in using Spark with MongoDB. Follow the instructions in the notebook to learn how to load the data from MongoDB to Databricks Delta Lake using Spark. 2. Using $out operator and object storage This approach involves using the $out stage in the MongoDB aggregation pipeline to perform a one-time data load into object storage. Once the data is in object storage, it can be configured as the underlying storage for a Delta Lake. To make this work, you need to set up a Federated Database Instance to copy our MongoDB data and utilize MongoDB Atlas Data Federation's $out to S3 to copy MongoDB Data and land it in an S3 bucket. The first thing you'll need to do is navigate to "Data Federation" on the left-hand side of your Atlas Dashboard and then click "Create Federated Database Instance" or "Configure a New Federated Database Instance." Connect your S3 bucket to your Federated Database Instance. This is where we will write the MongoDB data. The setup wizard should guide you through this pretty quickly, but you will need access to your credentials for AWS. Select an AWS IAM role for Atlas. If you created a role that Atlas is already authorized to read and write to your S3 bucket, select this user. If you are authorizing Atlas for an existing role or are creating a new role, be sure to refer to the documentation for how to do this. Enter the S3 bucket information. Enter the name of your S3 bucket. Choose Read and write, to be able to write documents to your S3 bucket. Assign an access policy to your AWS IAM role. Follow the steps in the Atlas user interface to assign an access policy to your AWS IAM role. Your role policy for read-only or read and write access should look similar to the following: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetObject", "s3:GetObjectVersion", "s3:GetBucketLocation" ], "Resource": [ <role arn> ] } ] } Define the path structure for your files in the S3 bucket and click Next. Now you've successfully configured S3 bucket with Atlas Data Federation. Connect to your MongoDB instance using the MongoDB shell. This command prompts you to enter the password. mongosh "mongodb+srv://server.example.mongodb.net" --username username Specify the database and collection that you want to export data from using the following commands. use db_name; db.collection_name.find() Replace db_name and collection_name with actual values and verify the data exists by running the above command. Use the $out operator to export the data to an S3 bucket. db.[collection_name].aggregate([{$out: "s3://[bucket_name]/[folder_name]"}]) Make sure to replace [collection_name], [bucket_name] and [folder_name] with the appropriate values for your S3 bucket and desired destination folder. Note: The $out operator will overwrite any existing data in the specified S3 location, so make sure to use a unique destination folder or bucket to avoid unintended data loss. Real-time data synchronization Real-time data synchronization needs to happen immediately following the one-time load process. This can be achieved in multiple ways, as shown below. 1. Using Apache Kafka and Delta Live Table Streaming data from MongoDB to Databricks using Kafka and Delta Live Table Pipeline is a powerful way to process large amounts of data in real-time. This approach leverages Apache Kafka, a distributed event streaming platform, to receive data from MongoDB and forward it to Databricks in real-time. The data can then be processed using Delta Live Tables (DLT), which makes it easy to build and manage reliable batch and streaming data pipelines that deliver high-quality data on the Databricks Lakehouse Platform. Download and Install the MongoDB Source connector plugin in your Kafka Cluster from here . Update the following in the mongodb-source-connector.properties connector configuration file. CONNECTION-STRING - MongoDB Cluster Connection String DB-NAME - Database Name COLLECTION-NAME - Collection Name Note: These configurations can be modified based on the use case. Refer to this documentation for more details. Deploy the connector configuration file in your Kafka Cluster. This will enable real time data synchronization from MongoDB to Kafka Topic. Login to Databricks cluster, Click on New > Notebook . In create a notebook dialog, enter a Name , select Python as the default language, and choose the Databricks cluster. Then click on Create . Obtain the IPython notebook for DLT pipeline from here . Go to File > Import , and navigate to the notebook you downloaded in the previous step Click on Import to add the data streaming notebook to your workspace. Update the following variables in the notebook and save. TOPIC - Kafka Topic Name (i.e DB.COLLECTION name) KAFKA_BROKER - Kafka Bootstrap Server details API_KEY - Kafka Server API Key SECRET - Kafka Server Secret Now, Navigate to the sidebar and select the Workflows option. Within Workflows, choose the Delta Live Tables tab and select Create Pipeline . Give your pipeline a name and select Advanced for the product edition. Choose Continuous for the Pipeline Mode. Set the cluster_policy to none and select the notebook you created under Notebook Libraries. Optionally, you can choose to enter a storage location for the output data from the pipeline. If you leave the Storage location field blank, the system will use the default location. You can leave the settings in the Compute section at their default values. Click the Create button to create the pipeline. Run the pipeline to stream the data from Kafka to Delta Live Table. Refer to this documentation to learn more about Delta Live table. 2. Using Spark streaming MongoDB has released a version of the MongoDB Connector for Apache Spark that leverages the new Spark Data Sources API V2 with support for Spark Structured Streaming. MongoDB Connector for Apache Spark enables real-time micro-batch processing of data, enabling you to synchronize data from MongoDB to Databricks using Spark Streaming. This allows you to process data as it is generated, with the help of MongoDB's change data capture (CDC) feature to track all changes. By utilizing Spark Streaming, you can make timely and informed decisions based on the most up-to-date information available in Delta Lake. More details about the streaming functionality can be found here . Login to Databricks cluster, Click on New > Notebook . In create a notebook dialog, enter a Name , select Python as the default language, and choose the Databricks cluster. Then click on Create . Obtain the Spark streaming IPython notebook from here . Go to File > Import , and navigate to the notebook you downloaded in the previous step. Click on Import to add the data streaming notebook to your workspace. Follow the instructions in the notebook to learn how to stream the data from MongoDB to Databricks Delta Lake using Spark connector for MongoDB. 3. Using Apache Kafka and Object Storage Apache Kafka can be utilized as a buffer between MongoDB and Databricks. When new data is added to the MongoDB database, it is sent to the message queue using the MongoDB Source Connector for Apache Kafka. This data is then pushed to object storage using sink connectors, such as the Amazon S3 Sink connector. The data can then be transferred to Databricks Delta Lake using the Autoloader option, which allows for incremental data ingestion. This approach is highly scalable and fault-tolerant, as Kafka can process large volumes of data and recover from failures. Download and Install the MongoDB Source and AWS Sink Connector Plugin in your Kafka Cluster https://www.confluent.io/hub/mongodb/kafka-connect-mongodb https://www.confluent.io/hub/confluentinc/kafka-connect-s3 Update the following in the mongodb-source-connector.properties connector configuration file. CONNECTION-STRING - MongoDB Cluster Connection String DB-NAME - Database Name COLLECTION-NAME - Collection Name Update the following in the s3-sink-connector.properties connector configuration file. TOPIC-NAME - Kafka Topic Name (i.e DB.COLLECTION name) S3-REGION - AWS S3 Region Name S3-BUCKET-NAME - AWS S3 Bucket Name where you wish to push the data. Deploy the connector configuration files in your Kafka Cluster. This will enable real time data synchronization from MongoDB to AWS S3 Buckets. Note: The above connector pushes the data to the S3 bucket at a regular interval of time. These configuration can be modified based on the use case. Refer to the following documentation for more details. MongoDB Source Configuration AWS S3 Sink Configuration Load the data from S3 buckets to Databricks Delta lake using Databricks Autoloader feature. Refer to this documentation for more details. In conclusion, the integration between MongoDB Atlas and the Databricks Lakehouse Platform can offer businesses a complete solution for data management and analysis. The integration architecture between these two platforms is flexible and scalable, ensuring data accuracy and consistency. All the data you need for analytics is in one place in the Lakehouse. Whether it's through one-time data load or real-time data synchronization, the combination of MongoDB Atlas as an Operational Data Store (ODS) and Databricks Lakehouse as an Enterprise Data Warehouse/Lake (EDL) provides the ideal solution for modern enterprises looking to harness the value of their data. So, if you're struggling with the challenges of siloed data, slow decision-making, and outdated development processes, the integration of MongoDB Atlas and Databricks Lakehouse may be the solution you need to take your business to the next level. Please reach out to partners@mongodb.com for any questions.

February 27, 2023

Build Analytics-Driven Apps with MongoDB Atlas and the Microsoft Intelligent Data Platform

Customers increasingly expect engaging applications informed by real-time operational analytics, yet meeting these expectations can be difficult. MongoDB Atlas is a popular operational data platform that makes it straightforward to manage critical business data at scale. For some applications, however, enterprises may also want to apply insights gleaned from data warehouse, business intelligence (BI), and related solutions, and many enterprises depend on the Microsoft Intelligent Data Platform to apply analytics and governance solutions to operational data stores. MongoDB and Microsoft have partnered to make it simple to use the Microsoft Intelligent Data Platform to glean and apply comprehensive analytical insights to data stored in MongoDB. This article details how enterprises can successfully use MongoDB with the Microsoft Intelligent Data Platform to build more engaging, analytics-driven applications. Microsoft Intelligent Data Platform + MongoDB MongoDB Atlas provides a unified interface for developers to build distributed, serverless, and mobile applications with support for diverse workload types including operational, real-time analytics, and search. With the ability to model graph, geospatial, tabular, document, time series, and other forms of data, developers don’t have to go for multiple niche databases, which results in highly complex, polyglot architectures. The Microsoft Intelligent Data Platform offers a single platform for databases, analytics, and data governance by integrating Microsoft’s database, analytics, and data governance products. In addition to all Azure database services, the Microsoft Intelligent Data Platform includes Azure Synapse Analytics for data warehousing and analytics, Power BI for BI reporting, and Microsoft Purview for enterprise data governance requirements. Although customers have always been able to apply the Microsoft Intelligent Data Platform services to MongoDB data, doing so hasn't always been as simple as it could be. Through this new integration, customers gain a seamless way to run analytics and data warehousing operations on the operational data they store in MongoDB Atlas. Customers can also more easily use Microsoft Purview to manage and run data governance policies against their most critical MongoDB data, thereby ensuring compliance and security. Finally, through Power BI customers are empowered to easily query and extract insights from MongoDB data using powerful in-built and custom visualizations. Let’s deep dive into each of these integrations. Operationalize insights with MongoDB Atlas and Azure Synapse Analytics MongoDB Atlas is an Operational Data Platform which can handle multiple workload types including transactional, search, operational analytics, etc. It can cater to multiple application types including distributed, serverless, mobile, etc. For data warehousing workloads, long-running analytics, and AI/ML, we compliment Azure Synapse Analytics very well. MongoDB Atlas can be easily integrated as a source or as a sink resource in Azure Synapse Analytics. This connector is useful to: Fetch all the MongoDB Atlas historical data into Synapse Retrieve incremental data for a period based on filter criteria in a batch mode, to run SQL based or Spark based analytics The sink connector allows you to store the analytics results back to MongoDB, which can then power applications enabled on top of it. Many enterprises require real-time analytics, for example, in fraud detection, anomaly detection of IoT devices, predicting stock depletion, and maintenance of machinery, where a delay in getting insights could cause serious repercussions. MongoDB and Microsoft have worked together to come up with the best practice architecture for the same which can be found in this article . Figure 1: Schematic showing integration of MongoDB with Azure Synapse Analytics. Business intelligence reporting and visualization with PowerBI Together, MongoDB Atlas and Microsoft PowerBI offer a sophisticated real-time data platform, providing customers with the ability to present specialized operational and analytical query engines on the same data sets. Information on connecting from PowerBI desktop to MongoDB is available in the official documentation . MongoDB is also excited to announce the forthcoming MongoDB Atlas Power BI Connector that will expose the richness of the JSON document data with Power BI (see Figure 2). This MongoDB Atlas Power BI Connector allows users to unlock access to their Atlas cloud data. Figure 2: Schematic showing integration of MongoDB and Microsoft Power BI. Beyond providing mere access to MongoDB Atlas data, this connector will provide a SQL interface to let you interact with semi-structured JSON data in a relational way, thereby ensuring you can take full advantage of Power BI's rich business intelligence capabilities. Importantly, through the connector, support is planned for two connectivity modes: import and direct. This new MongoDB Atlas Power BI Connector will be available in the first half of 2023. Conclusion Together with the Microsoft Intelligent Data Platform offerings, MongoDB Atlas can help operationalize the insights driven from customers’ data spread across siloed legacy databases and help build modern applications with ease. With MongoDB Atlas on Microsoft Azure, developers receive access to the most comprehensive, secure, scalable, and cloud–based developer data platform in the market. Now, with the availability of Atlas on the Azure Marketplace, it’s never been easier for users to start building with Atlas while streamlining procurement and billing processes. Get started today through the MongoDB Atlas on Azure Marketplace listing .

January 10, 2023

Migrate to MongoDB Atlas on AWS with Relational Migrator

Competitive advantage is directly tied to how well companies are able to build software around their most important asset: data. Rigid relational schemas often require downtime and significant application code updates in order to make even simple modifications, such as adding a new data attribute. In MongoDB, entities are modeled as documents that map naturally to the same objects that developers are used to working with in their programming languages. Additionally, legacy relational databases were not built to scale horizontally. Sharding data to handle large data volumes and ensure lower latency is typically a significant manual process that requires custom application logic to query across multiple shards and aggregate results. MongoDB Atlas is an effective solution for solving such problems, and MongoDB Relational Migrator streamlines the process of moving to MongoDB from a relational database. MongoDB Atlas on Amazon Web Services (AWS) MongoDB Atlas removes the need for a complex object-relational mapping layer and allows developers to build and release new features quicker. MongoDB Atlas is built to be distributed and to handle sharding transparently to the developer. With Atlas, no application code changes are necessary when an application needs to scale out from a 10MB to a 500TB dataset. MongoDB Atlas is well integrated into the AWS environment, and the document-based database works seamlessly with AWS products. To learn more about common integration and project requirements, refer to Managed MongoDB on AWS . Migrate to Atlas on AWS with Relational Migrator Some customers have successfully migrated their relational workloads to MongoDB Atlas on AWS. One example is Cox Automotive, whose system was hitting limitations on the relational database. The company migrated to Atlas on AWS and leveraged capabilities like Atlas Data Lake (powered by Amazon Simple Storage Service (Amazon S3)) and Atlas App Services . Read our customer case study to learn more about how Cox Automotive uses MongoDB Atlas . At the same time, other companies have struggled with how to approach this challenge. When considering such a migration, it’s important to think carefully about data modeling. Although it’s possible to naively move a relational schema into MongoDB without any changes, this approach won’t deliver many of MongoDB’s benefits. A better practice is to design a new and better MongoDB schema that’s more denormalized and potentially to take the opportunity to revise the architecture of the application as well. To make this process easier, we’re developing MongoDB Relational Migrator . Relational Migrator streamlines the process of moving to MongoDB from a relational database and is compatible with Oracle, Microsoft SQL Server, MySQL, and PostgreSQL. MongoDB Relational Migrator connects to a relational database to analyze its existing schema, then helps architects design and map to a new MongoDB schema. Migration support When you’re ready, Relational Migrator will perform the data migration from the source RDBMS to MongoDB. Migration can be a one-shot migration if you’re prepared for a hard cutover. And soon, we will also support a continuous sync in case you need to leave the source system running and continue pushing changes into MongoDB. With Relational Migrator, you can map your relational schema—or just a piece of it, if needed—to a new MongoDB schema. Relational Migrator helps the design and mapping process with common MongoDB schema design patterns built in. Based on this schema mapping, you can move data into a target MongoDB cluster. Relational Migrator will support both Snapshot (one-time) and continuous data migration. Get more details on Relational Migrator on the product page or in the deep dive presentation from MongoDB World. Get started with MongoDB Atlas in AWS Marketplace today.

November 3, 2022

Migrating Terabytes of IoT Data From a Legacy NoSQL Database to MongoDB Atlas With MongoDB's Custom Migration Tool

In 2020, a large European energy company began an ambitious plan to replace its traditional metering devices — all 7.6 million of them — with smart meters. That would allow the energy company to monitor gas use remotely and allow customers’ bills to more accurately reflect their energy consumption. At the same time, the company began installing smart components along their production network to monitor operations in real-time, manage alarms, use predictive maintenance tools, and find leaks using advanced technologies. The energy company knew this shift would result in a massive amount of data coming into their systems, and they thought they were ready for it. They understood the complexities of managing and leveraging data from the Internet of Things (IoT), such as the high velocity at which data must be ingested and the need for time-based data aggregations. They rolled out an IoT platform with big data and analytics tools to help them make progress toward their objectives of high-quality, efficient, and safe service. This article examines how the company migrated its system to MongoDB Atlas in order to handle the massive influx of data. Managing data The energy company was managing 3 TB of data on its NoSQL database, with the remainder housed and managed on a relational database. However, it started facing challenges, including a lack of scalability, increasing costs, and poor performance. The costs to maintain the pre-production and production environments were unsustainable, and the situation wasn’t going to get better: By 2023, the energy company planned to increase the number of IoT devices and sensors by a factor of five. They needed a viable solution for the long term. Migrating to MongoDB Atlas The energy company decided to migrate to MongoDB Atlas for several reasons. Atlas’s online archive, combined with the ability to create time-series sharded collections, makes Atlas an ideal fit for IoT data, as does the flexibility of the document data model. Additionally, an API that was compatible with the existing database would minimize the impact on application code and make it easier to migrate applications. The customer chose PeerIslands to be its technical partner and help them with the migration. PeerIslands, a MongoDB partner, is an enterprise-class digital transformation company with an expert, multilingual team with significant experience working across multiple technologies and cloud platforms. PeerIslands has developed solutions for both homogenous and heterogenous workload migrations. Among these solutions is a MongoDB migration tool that helps perform one-time migrations and change data capture while minimizing downtime. The tool is fully GUI-based, and tasks such as infrastructure provisioning, dump and restore, change stream listeners, and processors have all been automated. For change capture, the tool uses the native MongoDB change stream APIs. Migration challenges In working with the energy company to perform the migration, the PeerIslands team faced two particular challenges: The large volume of data. Initial snap-shotting of the data would take about a day. The application had significant write loads. On average, it was writing about 12,000 messages per second. However, the load was unevenly distributed, with spikes when devices would “wake up” and report their status. These two factors quickly generated close to 20 million change events that had to be synced to MongoDB. Meanwhile, new data was constantly being written into the source. Migration tool PeerIslands’ migration tool uses mongodump and mongorestore for one-time data migration and MongoDB Kafka Connector for real-time data synchronization. By using Apache Kafka, the migration tool was able to handle the large amount of change stream data and successfully manage the migration. To address the complexity of the migration, PeerIslands also enhanced the migration tool with additional capabilities: Parallelize the Kafka change stream processing using partitions. The Kafka partitioning strategy was in sync with the target Atlas sharding strategy. Use ReplaceOneBusinessKeyStrategy as the write model for Kafka MongoDB sink connector to write into sharded Atlas collections. By using its in-house tooling, PeerIslands was able to successfully complete the migration with near zero downtime. Improved performance With the migration complete, the customer has already begun to realize the benefits of MongoDB Atlas for their massive amounts of IoT data. The user interface has become extremely responsive, even in front of more expensive queries. Because of the improved performance of the database, the customer is now able to pursue improvements and efficiencies in other areas. With better performance, the company expects consumption of the data to rise and their schema design to evolve. They’re looking to leverage the time-series benefits of MongoDB both to simplify their schema design and deliver richer IoT functionality. They’re also better equipped to rapidly respond to and fulfill business needs, because the database is no longer a limitation. Importantly, costs have decreased for the production environment, and even more dramatic cost reductions have been seen for the pre-production environment. Learn more about the migration tool and MongoDB’s time series capabilities . Interested in your own data migration? Contact us .

July 11, 2022

Faster Migrations to MongoDB Atlas on Google Cloud with migVisor by EPAM

As the needs of Google Cloud customers evolve and shift towards new user expectations, more and more customers are choosing MongoDB as an ideal alternative to legacy databases. Together, we’ve partnered with users looking to digitize and grow their businesses (such as Forbes ), or meet increased demand due to COVID (such as our work with Boxed , the online grocer) by scaling up infrastructure and data processing within a condensed time frame. As a fully-managed service within the Google Cloud Marketplace , MongoDB Atlas enables our joint customers to quickly deploy applications on Google Cloud with a unified user experience and an integrated billing model. Migrations to managed cloud database services vary in complexity, but even under the most straightforward circumstances, careful evaluation and planning is required. Customer database environments often leverage database technologies from multiple vendors, across different versions, and can run into thousands of deployments. This makes manual assessment cumbersome and error prone. This is where EPAM Systems , a provider with strategic specialization in database and application modernization solutions, comes in. EPAM’s database migration assessment tool, migVisor , is a first-of-its-kind cloud database migration assessment product that helps companies analyze database workloads, configuration, and structure to generate a visual cloud migration roadmap that identifies potential quick wins as well as challenge areas. migVisor identifies the best migration path for databases using sophisticated scoring logic to rank the complexity of migrating them to a cloud-centric technology stack. Previously applicable only to migrations from RDBMS to cloud-based RDBMS, migVisor is now available for MongoDB to MongoDB Atlas migrations. migVisor helps you: Analyze migration decisions objectively by providing a secure assessment of source and target databases that’s independent of deployed environments Accelerate time to migration by automating the discovery and assessment process, which reduces development cycles from a few weeks to a few days Easily understand tech insights by providing a visual overview of your entire journey, enabling better planning and improving stakeholder visibility Reduce database licensing costs by giving you intelligent insights on the target environment and recommended migration paths Key features of migVisor for MongoDB For several years, migVisor by EPAM has delivered automated assessments that have helped hundreds of customers migrate their relational databases to cloud-based or cloud-native databases. Now, migVisor adds support for the world’s leading modern data platform: MongoDB. As part of the initial release, migVisor will support self-managed MongoDB to MongoDB Atlas migration assessments. We plan to support TCO for MongoDB migrations, application modernization, migration assessment, and relational MongoDB migration assessments in future releases. MongoDB is also a natural fit for Google Cloud’s Open Cloud strategy of providing customers a broad set of fully managed database services, as Google Cloud's own GM and VP of Engineering & Databases, Andi Gutmans, notes: We are always looking for ways to simplify migrations for our customers. Now, with EPAM's database migration assessment tool, migVisor, supporting MongoDB Atlas, our customers can easily complete database assessments—including TCO analyses and migration complexity assessments, and generate comprehensive migration plans. A simplified migration experience combined with our joint Marketplace success enables customers to consolidate their data workloads into the cloud while making the development and procurement process simple—so users can focus more on innovation. How the migVisor assessment works migVisor analyzes source databases (on-prem or in any cloud environment) for migration assessment to a new target. The assessment includes the following steps: The simple-to-use migVisor Metadata Collector (mMC) collects metadata from the source database, including: featureCompatibilityVersion value, journaling status for data bearing nodes, MongoDB storage size used, replica set configuration, and more. Figure 1: mMC GUI Edit Connection Screen On the migVisor Analysis Dashboard you can select the source/target pair (e.g., MongoDB to MongoDB Atlas on Google Cloud). Figure 2: Source and Target Selection In the migVisor console, you can then view the automated assessment output that was created from migVisor’s migration complexity scoring engine, including classification of the migration into high/medium/low complexity and identification of potential migration challenges and incompatibilities. Figure 3: Source Cluster Features Finally, you can also export the assessment output in CSV format for further analysis in your preferred data analysis/reporting tool. Conclusion Together, Google Cloud and MongoDB have successfully worked with many organizations to streamline cloud migrations and modernize their legacy landscape. To build on the foundation of providing our customers with the best-in-class experience, we’ve closely worked with Google Cloud and EPAM Systems to integrate MongoDB Atlas with migVisor. Because of this, customers will now be able to better plan migrations, reduce risk and avoid missteps, identify quick wins for TCO reduction, review migration complexities, and appropriately plan migration phases for the best outcomes. Learn more about how you can deploy, manage, and grow MongoDB on Google Cloud on our partner page . If you’d like guidance and migration advice, please reach out to mdb-gcp-marketplace@mongodb.com to get in touch with the Google, MongoDB, and EPAM Sales teams.

January 21, 2022

PeerIslands Cosmos DB Migrator Tool to MongoDB Atlas on Google Cloud

When you’re in the midst of innovating, the last thing you want to worry about is infrastructure. Whether you’re looking to streamline inventory management or reimagine marketing, you need applications that can scale fast and maintain high availability. That’s where MongoDB Atlas on Google Cloud comes in. With MongoDB Atlas’ general-purpose, document-based database, users can free themselves from the hassle of database management, and give back precious time to developers to focus on innovation. Combine these benefits with Google Cloud’s cloud computing power, high availability, and ability to integrate with tools like BigQuery, Dataflow, Dataproc and more, and it’s hard to find a comparable joint solution. In fact, many current Microsoft Azure Cosmos DB users are now considering making the move to MongoDB. Microsoft’s Cosmos DB only supports single partition transactions, has no schema governance and forces developers to work with five different APIs to deliver full application functionality. Conversely, MongoDB Atlas on Google Cloud supports distributed multi-document ACID transactions, includes schema governance, and offers integrated full-text search, auto-archiving, data lakes, and edge-to-cloud data sync. The following blog illustrates how PeerIslands’ Cosmos DB Migrator tool can help users move from Cosmos DB to MongoDB Atlas on Google Cloud. Why PeerIslands PeerIslands is an enterprise-class digital transformation company composed of a team of polyglots who are comfortable across multiple technologies and cloud platforms. As a services firm, PeerIslands is focused on helping customers with both cloud-native development and application transformation. With best-in-the-industry talent, PeerIslands has been working with the MongoDB team to build a suite of solutions around two key objectives: For a customer evaluating MongoDB, how can we rapidly address common questions? Once a customer has chosen MongoDB, how can we reduce time to value by rapidly migrating workloads to MongoDB? With this in mind, PeerIslands developed a suite of tools around schema generation, understanding MongoDB query performance, as well as helping customers understand code changes required for upgrading MongoDB versions. In terms of workload migrations, PeerIslands developed solutions for both homogenous and heterogenous migrations. The company is also contributing to the open source community with a mobile app for enabling MongoDB admins to manage Atlas on the go. PeerIslands' Cosmos DB migration use case The current approach for migrating data from Cosmos DB to MongoDB is to use MongoDB dump and restore. But there are several problems with this approach. It’s fully manual and CLI-based which creates a poor user experience and requires technical resources even for simple migrations. There’s a lack of change capture capability which requires downtime during the duration of migration. For large Cosmos DB migrations, this causes significant issues. The team is also under pressure to deliver the entire migration in a short period of time. Migrations often get delayed as customers have difficulty identifying the right migration window. The Cosmos to MongoDB tool is a “Live Migrate” like tool that helps perform one-time migrations and change data capture from Cosmos DB (MongoDB model) to MongoDB Atlas and minimizes downtime requirements associated with migrations. The tool is fully GUI-based and nearly everything is automated. All the tasks for infrastructure provisioning, dump & restore, change stream listeners and processors have all been automated with a graphical user interface (GUI). The Cosmos to Mongo migration tool uses native MongoDB tools and the performance is similar to native tools. For change capture, we leverage the native MongoDB change stream APIs. A high level view of the solution is provided in figure 1 below: Figure 1: Solution Map Migration steps: Migration configuration: Provide the name of the migration task, source Cosmos DB details, and target MongoDB details. The tool supports key vault integration as well. Migration infrastructure provisioning: Provide migration infrastructure details required for creating the VM (Virtual Machine) including location, type of VM instance, etc. Migration execution: Allow for automation of the migration once the configuration is complete. The migration is executed in 3 steps: backup, restore and change event processing. As a user, you can initiate the backup process. The change event listener is started in parallel with the backup process and captures all the changes. Once the backup is complete, the user can restore the initial data and then perform change event processing to apply all the changes to MongoDB. Migration validation: The tool also provides facilities for validating the migration. Users can view the total number of documents on both source Cosmos DB collection and target MongoDB collection. They can also compare random documents picked up from Cosmos DB and MongoDB side by side and validate whether the data elements have been loaded correctly. For a more detailed demo and description of events, watch the following video: Migrating to a new database can feel daunting at first, but PeerIslands Cosmos DB migrator makes it easy. Major concerns like delays and downtime are eliminated from the process, helping you run your business smoothly and reap the benefits of MongoDB more quickly. And with PeerIslands suite of tools, you can rapidly address MongoDB-specific questions and accelerate time to value. Reach out today to get started

December 9, 2021

Make Migrating to MongoDB Atlas on AWS Easy with PeerIslands Modernization Tool Set

As cloud computing becomes commonplace across industries, organizations are rapidly adopting MongoDB Atlas because they know that true modernization is about more than just moving data as-is to the cloud—i.e. taking a “lift and shift” approach. It’s also about remodeling that same data along the way for faster and more iterative development. With MongoDB’s document-based database, developers are empowered to reimagine how they build with flexible schema design that allows them to easily model and remodel data for a wide range of use cases, while still applying governance when needed. MongoDB Atlas maps naturally to modern object-oriented programming languages, making developers' lives much easier. In contrast to the rigidity of SQL databases, MongoDB’s flexible data model means that your database schema can evolve with business requirements. This helps users build applications faster, handle diverse data types and manage applications more efficiently at scale. As a fully-managed service, MongoDB Atlas takes care of database maintenance for you and can also be scaled within and across multiple distributed data centers, providing new levels of availability and scalability previously unachievable with relational databases. The advantages of moving to MongoDB Atlas are clear, but some companies may still feel reluctant to leave behind the legacy relational databases they’re familiar with for unknown territory. This is where PeerIslands comes in. With PeerIslands, you don’t have to go it alone. The following blog introduces PeerIslands’ modernization capabilities, and how you can leverage them to migrate seamlessly to MongoDB Atlas on AWS. Why PeerIslands? PeerIslands is an enterprise-class digital transformation company composed of a team of polyglots who are comfortable across multiple technologies and cloud platforms. As a services firm, PeerIslands is focused on helping customers with both cloud-native development, and applications transformation. With best-in-the-industry talent, the team has helped several Fortune 50 companies bring large-scale transformations to life, and has received recognition from several clients and partners, including MongoDB. With engineers trained and certified in MongoDB, PeerIslands has helped MongoDB’s ISV and retail customers modernize, moving software built for on-prem to SaaS environments more conducive to cloud environments, and was named MongoDB’s Boutique System Integrator Partner of the Year . PeerIslands can swiftly transform and migrate core, legacy, and on-premises applications to the cloud. They develop solutions based on cutting-edge microservices and serverless architecture across public cloud platforms and hybrid PaaS platforms to help users quickly get applications to customers and business users. How PeerIslands can help PeerIslands has been working with MongoDB and AWS to develop tools that address two key objectives for customers: Objective 1: Tools that address common customer questions when evaluating MongoDB MongoDB Test Data Generator: A fully UI-driven tool with an extensive data library for rapidly loading MongoDB with use-case specific, near real-world data at scale MongoDB Performance Testing tool: A performance analyzer where you can create multiple load profiles, run-use case specific MongoDB queries and understand the performance of the queries. With the test data generator and the performance testing tool, customers can get a clear view of the performance of MongoDB for their specific situation even before migrating to MongoDB MongoDB Schema Generator and Data Modeler: SchemaGen tool helps to rapidly generate draft JSON schema from your existing SQL schema. On top of this, you can then perform the data modeling exercise and generate schema to form your MongoDB schema. The schema generator also provides key information about the SQL DB like size, index, and more MongoDB Sizer: MongoDB sizing tool helps you understand the size implications of your schema and calculate Atlas sizing. With the MongoDB sizer, customers can upload their own schema and calculate the various factors that influence the Atlas sizing Codescanner: A tool for scanning your code repositories for deprecated MongoDB APIs. With the code scanner, customers can get a clear view of the application impact for upgrading MongoDB versions Objective 2: Tools that accelerate time to value by rapidly moving workloads to MongoDB COSMOS2Atlas migration: A point-and-click solution that helps COSMOS customers migrate data from COSMOS to MongoDB. This solution provides change capture capability to ease downtime requirements and makes data migration easy and seamless 1Data: A tool for addressing more complex requirements of migrating data from SQL to MongoDB Admin mobile app: A mobile app for admins to track key Atlas KPIs and approve common access requests on the go PeerIslands brings to the table an entire suite of tools for addressing all your MongoDB needs. PeerIslands use-case featuring 1Data tool One of the key requirements of modernization projects is to solve large-scale data migrations from SQL databases. There are a number of tools that are available which simply replicate data from SQL to MongoDB—but, we rarely use the same SQL schema in MongoDB. Schema transformation—however difficult to do at scale—is nonetheless required so that we can make the best use of MongoDB capabilities. Today, the typical approach is to run custom Spark jobs as they are scalable and flexible when it comes to processing schema transformations and loading the data into MongoDB. But when you go beyond migrating one or two tables in a Proof of concept (PoC) setting, the problem becomes much more complex. For instance, writing custom Spark programs for every schema transformation is cumbersome and error-prone. For even simple migrations we will have tens of Spark programs. Any defects that occur during transformation are going to cause significant issues. Also consider the following challenges: How do you extract data out of your SQL database without impacting database performance? How do you handle infrastructure provisioning and scaling? How do you orchestrate the migration? Few master tables can be migrated once but transaction tables may need both one-time migration and a daily incremental migration. How can you do this orchestration at scale? How do you know whether you have not lost data during migration? Last but not the least, once a data is migrated how do you keep it up to date? We will probably end up with a suite of tools to address these issues–SQOOP, Kafka, Spark, some kind of a job orchestration engine, an observability suite, notification workflow and so on. It will quickly become evident that migrating data from SQL to MongoDB without disrupting business could be the most daunting barrier to adopting MongoDB. Unfortunately, current tools invariably fail for complex heterogeneous migration scenarios and developers end up writing a lot of custom code. Realizing this issue, PeerIslands has been working with MongoDB and AWS to develop 1Data. 1Data is a platform that helps enterprises perform migration and real time synchronization of data between SQL databases and MongoDB. 1Data is designed to complement existing AWS services like DMS in migrating data out of SQL. Key features of 1Data: Data is fully GUI based — There is no coding required 1Data provides a single platform for both one-time migration and continuous updates 1Data is consistent across one-time migration and continuous updates. This provides a good anti-corruption layer for continuous updates The tech stack of 1Data is based on Spark, Kafka among others and is highly scalable 1Data is highly modular and has a well defined API layer. 1Data can be easily extended to your needs 1Data automatically handles all the infrastructure required for migration with AWS quick start templates High Level Solution Architecture 1Data capabilities are realized through a decoupled and highly scalable architecture. The data extract, transformation and load part are independent of each other and can easily be customized based on the specific requirements of the customer. The architecture can orchestrate between batch-based initial loads and streaming-based CDC loads. A Spark, Kafka, and Airflow-based tech stack provides excellent scalability for the 1Data platform to handle large data migrations. Figure 1: 1Data High Level solution architecture OneData Portal structures migrations using Endpoints, Tasks and DAGs (Directed Acyclic Graphs) Endpoints define source, intermediate and final data locations and can come in the form of files, databases or queues. Endpoints can also be database extracts in S3 from AWS DMS service. Task definition is the second step in the migration. Tasks act on source point and produce data in either staging or destination end point. There are a number of predefined tasks available:Extract, Transformation, Sink and Validation tasks. You can configure both streaming and batch tasks. Defining the DAGs is the final step before actual migration. DAGs are used to define the sequence in which a user wants to execute the defined tasks. The technology components used in 1Data allows for easily handling very large data migrations. Each of the components has been selected such that they can be deployed across multiple cloud platforms and can be scaled easily. Technology Stack details below: Web Portal: Angular WebAPI: Node Configuration Database: MongoDB Data Transformation & Validation: Spark Data Extraction: Sqoop, Spark, DMS Change Data Capture: Kafka, Debezium Data Sink: Spark Job/Task Orchestrator: Airflow PeerIslands has worked with AWS and MongoDB to create a Quick Start for 1Data. With Quick Start, customers can rapidly instantiate 1Data for their migration requirements. To recap, with 1Data Quick start on AWS, we can Perform heterogeneous schema transformation from SQL and load data into MongoDB Atlas on AWS Weave together continuous data updates, incremental data updates and one-time migration using a combination of batch and streaming jobs Orchestrate the migrations tasks Validate the migration ...And all without writing a single line of code! Demo Looking forward A modern, data architecture can help you unlock your business’ full potential, and gain real-time access to the insights you need, when you need them. MongoDB’s document-based database and flexible schema design help you make smarter decisions, cut costs, and take full advantage of AI/ML capabilities to empower your employees and raise customer satisfaction. The decision to migrate off your legacy systems and onto MongoDB is easy—and now the process is, too. Let PeerIslands help you get there. Our best-in-class teams leverage next-generation technologies, including Artificial Intelligence (AI), Augmented Reality (AR), Blockchain, Internet of Things (IoT), Machine Learning (ML), Mobile, and Virtual Reality (VR). Our expertise spans the modern programming stack, and we follow best practices in distributed, agile, and lean principles as well as test-driven development and DevOp. Additional Resources ISV WMP Program Contact aws-isv-workload-migration@amazon.com for details Atlas Quick Start MongoDB Atlas Starter Package Atlas Migration Guide Atlas Migration Pattern Contact us with any questions around modernization with MongoDB, AWS, and PeerIslands.

October 28, 2021

Build a Single View of Your Customers with MongoDB Atlas and Cogniflare's Customer 360

This post is also available in: 简体中文 The key to successful, long-lasting commerce is knowing your customers. If you truly know your customers, then you understand their needs and wants and can identify the right product to deliver to them—at the right time and in the right way. However, for most B2C enterprises, building a single view of the customer poses a major hurdle due to copious amounts of fragmented data. Businesses gather data from their customers in multiple locations, such as ecommerce platforms, CRM, ERP, loyalty programs, payment portals, web apps, mobile apps and more. Each data set can be structured, semi-structured or unstructured, delivered as stream or require batch processing, which makes compiling already fragmented customer data even more complex. This has led some organizations to bespoke solutions, which still only provide a partial view of the customer. Siloed data sets make running operations like customer service, targeted marketing and advanced analytics—such as churn prediction and recommendations—highly challenging. Only with a 360 degree view of the customer can an organization deeply understand their needs, wants and requirements, as well as how to satisfy them. A single view of that 360 data is therefore vital for a lasting relationship. In this blog, we’ll walk through how to build a single view of the customer using MongoDB’s database and Cogniflare’s Calledio Customer 360 tool. We’ll also explore a real-world use case focused on sentiment analysis. Building a single view with Calleido's Customer 360 With a Customer 360 database, organizations can access and analyze various individual interactions and touchpoints to build a holistic view of the customer. This is achieved by acquiring data from a number of disparate sources. However, routing and transforming this data is a complex and time-consuming process. Many of the existing Big Data tools often aren’t compatible with cloud environments. These challenges inspired Cogniflare to create Calleido . Figure 1: Calleido Customer 360 Use Case Architecture Calleido is a data processing platform built on top of battle-tested open source tools such as Apache NiFi. Calleido comes with over 300 processors to move structured and unstructured data from and to anywhere. It facilitates batch and real-time updates, and handles simple data transformations. Critically, Calleido seamlessly integrates with Google Cloud and offers one-click deployment. It uses Google Kubernetes Engine to scale up and down based on the demand, and provides an intuitive and slick low-code development environment. Figure 2: Calleido Data Pipeline to Copy Customers From PostgreSQL to MongoDB A real-world use case: Sentiment analysis of customer emails To demonstrate the power of Cogniflare’s Calleido , MongoDB Atlas , and the Customer 360 view, consider the use case of conducting a sentiment analysis on customer emails. To streamline the build of a Customer 360 database, the team at Cogniflare created flow templates for implementing data pipelines in seconds. In the upcoming sections, we’ll walk through some of the most common data movement patterns for this Customer 360 use case and showcase a sample dashboard. Figure 3: Sample Customer Dashboard The flow commences with a processor pulling IMAP messages from an email server (ConsumeIMAP). Each new email that arrives into the chosen inbox (e.g. customer service), triggers an event. Next, the process extracts email headers to determine topline details about the email content (ExtractEmailHeaders). Using the sender's email, Calleido identifies the customer (UpdateAttribute) and extracts the full email body by executing a script (ExecuteScript). Now, with all the data collected, a message payload is prepared and published through Google Cloud Platform (GCP) Pub/Sub (Kafka can also be used) for consumption by downstream flows and other services. Figure 4: Translating Emails to Cloud PubSub Messages The GCP Pub/Sub messages from the previous flow are then consumed (ConsumeGCPPubSub). This is where the power of MongoDB Atlas integration comes in as we verify each sender in the MongoDB database (GetMongo). If a customer exists in our system, we pass the email data to the next flow. Other emails are ignored. Figure 5: Validating Customer Email with MongoDB and Calleido Analysis of the email body copy is then conducted. For this flow, we use a processor to prepare a request body, which is then sent to Google Cloud Natural Language AI to assess the tone and sentiment of the message. The results from the Language Processing API then go straight to MongoDB Atlas so they can be pulled through into the dashboard. Figure 6: Making Cloud AutoML Call with Calleido End result in the dashboard: The Customer 360 database can be used in internal back-office systems to supplement and inform customer support. With a single view, it’s quicker and more effective to troubleshoot issues, handle returns and resolve complaints. Leveraging information from previous client conversations ensures each customer is given the most appropriate and effective response. These data sets can then be fed into analytics systems to generate learnings and optimizations, such as associating negative sentiment with churn rate. How MongoDB's document database helps In the example above, Calleido takes care of copying and routing data from the business source system into MongoDB Atlas, the operational data store (ODS). Thanks to MongoDB’s flexible data structure, we can transfer data in its original format, and subsequently implement necessary schema transformations in an iterative manner. There is no need to run complex schema migrations. This allows for the quick delivery of a single view database. Figures 7 & 8: Calleido Data Pipelines to Copy Products and Orders From PostgreSQL to MongoDB Atlas Calleido allows us to make this transition in just a few simple steps. The tool runs a custom SQL query (ExecuteSQL) that will join all the required data from outer tables and compile the results in order to parallelize the processing. The data arrives in Avro format, then Calleido converts it into JSON (ConvertAvroToJSON) and transforms it to the schema designed for MongoDB (JoltTransformJSON). End result in the Customer 360 dashboard: MongoDB Atlas is the market-leading choice for the Customer 360 database. Here are the core reasons for its world-class standard: MongoDB can efficiently handle non-standardized schema coming from legacy systems and efficiently store any custom attributes. Data models can include all the related data as nested documents. Unlike SQL databases, MongoDB avoids complicated join queries, which are difficult to write and not performant. MongoDB is rapid. The current view of a customer can be served in milliseconds without the need to introduce a caching layer. The MongoDB flexible schema model enables agility with an iterative approach. In the initial extraction, the data can be copied nearly exactly as its original shape. This drastically reduces latency. In subsequent phases, the schema can be standardized and the quality of the data can be improved without complex SQL migrations. MongoDB can store dozens of terabytes of data across multiple data centers and easily scale horizontally. Data can be shared across multiple regions to help navigate compliance requirements. Separate analytics nodes can be set up to avoid impacting performance of production systems. MongoDB has a proven record of acting as a single view database, with legacy and large organizations up and running with prototypes in two weeks and into production within a business quarter. MongoDB Atlas can autoscale out of the box, reducing costs and handling traffic peaks. The data can be encrypted both in transit and at rest, helping to accomplish compliance with security and privacy standards, including GDPR, HIPAA, PCI-DSS, and FERPA. Upselling the customer: Product recommendations Upselling customers is a key part of modern business, but the secret to doing it successfully is that it’s less about selling and more about educating. It’s about using data to identify where the customer is in the customer journey, what they may need, and which product or service can meet that need. Using a customer's purchase history, Calleido can help prepare product recommendations by routing data to the appropriate tools such as BigQuery ML. These recommendations can then be promoted through the call center and marketing teams for both online or mobile app recommendations. There are two flows to achieve this: preparing training data and generating recommendations: Preparing training data First, appropriate data from PostgreSQL to BigQuery is transferred using the ExecuteSQL processor. The data pipeline is scheduled to execute periodically. In the next step, appropriate data is fetched from PostgreSQL, dividing it to 1K row chunks with the ExecuteSQLRecord processor. These files are then passed to the next processor which uses load balancing enabled to utilize all available nodes. All that data then gets inserted to a BigQuery table using the PutBigQueryStreaming processor. Figure 9: Copying Data from PostgreSQL to BigQuery with Calleido Generating product recommendations Next, we move on to generating product recommendations. First, you must purchase Big Query capacity slots, which offer the most affordable way to take advantage of BigQuery ML features. Here, Calleido invokes an SQL procedure with the ExecuteSQL processor, then ensures that the requested BigQuery capacity is ready to use. The next processor (ExecuteSQL) executes an SQL query responsible for creating and training the Matrix Factorization ML model using the data copied by the first flow. Next in the queue, Calleido uses the ExecuteSQL processor to query our trained model to acquire all the predictions and store them in a dedicated BigQuery table. Finally, the Wait processor waits for both capacity slots to be removed, as they are no longer required. Figure 10 & 11: Generating Product Recommendations with Calleido Then, we remove old recommendations through the power of two processors. First, the ReplaceText processor updates the content of incoming flow files, setting the query body. This is then used by the DeleteMongo processor to perform the removal action. Figure 12: Remove Old Recommendations The whole flow ends with copying Recommendations to MongoDB. The ExecuteSQL processor fetches and aggregates the top 10 recommendations per user, all in chunks of 1k rows. Then, the following two processors (ConvertAvroToJSON and ExecuteScript) prepare data to be inserted into the MongoDB collection, by the PutMongoRecord processor. Figure 13: Copy Recommendations to MongoDB End result in the Customer 360 dashboard (the data used here in this example is autogenerated): Benefits of Calleido's 360 customer database on MongoDB Atlas Once the data is available in a centralized operational data store like MongoDB, Calleido can be used to sync it with an analytics data store such as Google BigQuery. Thanks to the Customer 360 database, internal stakeholders can then use the data to: Improve customer satisfaction through segmentation and targeted marketing Accurately and easily access compliance audits Build demand planning forecasts and analyses of market trends Reward customer loyalty and reduce churn Ultimately, a single view of the customer enables organizations to deliver the right message to prospective buyers, funneling those at the brand awareness stage into the conversion stage and ensures retention and post sales mechanics are working effectively. Historically, a 360 view of the customer was a complex and fragmented process, but with Cogniflare’s Calleido and MongoDB Atlas, a Customer 360 database has become the most powerful and cost efficient data management stack that an organization can harness.

October 14, 2021

Simplifying Data Migrations From Legacy SQL to MongoDB Atlas with Studio 3T and Hackolade

Migrating data from SQL relational databases to MongoDB Atlas may initially seem to be a straightforward process. Export the data from the relational database, import the tables into MongoDB Atlas, and then start writing queries for it. But the deeper you look, the more it can start to feel like an overwhelming task. Decades of irrelevant indexes, rare relationships and forgotten fields that need to be migrated all make for a more complicated process. Not to mention, converting your old schemas to work well with the document-based world of MongoDB can take even longer. Making this process easier and more achievable was one of Studio 3T’s main goals when they created their SQL to MongoDB migration tools. In fact, Studio 3T's SQL migration doesn't just make this process easier; it also makes it reliably repeatable. This means you can tune your migration to produce the perfect documents in an ideal schema. There's no big bang cut over; you can proceed at your own pace. Delivering the ability to perfect your new schema is also why we've integrated with Hackolade's schema design and data modeling tools. In this article, we look at how the data modernization process works with a focus on the SQL migration powers of Studio 3T and the data modelling technology of Hackolade. So the problem was, how do I migrate the data that’s been collected over the last 10 years from MySQL over to MongoDB? And that’s when I found Studio 3T. I could not imagine trying to do it myself by hand... If it hadn’t been for Studio 3T, I probably would have just left it all in the SQL database. Rand Nix, IT Director, Wakefield Inspection Services Why MongoDB Atlas Building on MongoDB’s intuitive data model and query API, MongoDB Atlas gives engineering organizations the versatility they need to build sophisticated applications that can adapt to changing customer demands and market trends. Not only is it the only multi-cloud document database available, it also delivers the most advanced security and data distribution capabilities of any fully managed service. Developers can get started in minutes and leverage intelligent automation to maintain performance at scale as applications evolve over time. Mapping the move The crucial component to a SQL migration is mapping the SQL tables and columns to JSON documents and their name/value fields. While the rigid grid of relational tables has a familiar feeling, a document in MongoDB can be any shape and, ideally, it should be the shape that makes sense for your business logic. Doing this by hand can prove extremely time-consuming for developers. Which situations call for reshaping your data into documents and which should you construct new documents for? These are important questions which require time and expertise to answer. Or, you can simply use Studio 3T's SQL Migration tool which is a powerful, configurable import process. It allows you to create your MongoDB documents with data retrieved from multiple relational tables. Using foreign keys, it can capture the relationships between records in document-centric form. "When we started out, we built it on SQL Server. All was good to start, but by the time we got up to fifty customers and more, each with some thousands of their staff logging into the system concurrently, we ran into scaling issues. "Now we’re using a hosted, paid-for subscription on MongoDB Atlas. They do all the management and the sharding and all that stuff. And when we switch to regional provisioning, they can handle all that too, which is a huge relief. "We all use Studio 3T constantly, it’s used as extensively as Visual Studio. We use it for all sorts of things because Studio 3T makes it really easy to navigate around the data." Dan Cummings, Development Mgr. Terryberry Inc. For example, a relational database may have a table of customers and a table of orders by each customer. With the SQL Migration tool, you can automatically create a collection of customer documents, containing an array of all the orders that customer has placed. This exploits the power of the document model by keeping related data local. You can also preview the resulting JSON documents before running a full import so you are sure you're getting what you want. Figure 1. Studio 3T’s SQL to MongoDB Migration In Figure 1 (see above), we can see Studio 3T's SQL Migration pulling in multiple SQL tables, embedding rental records into the customer records, and previewing the JSON output. For more radical restructuring, SQL queries can also provide the source of the data for import. This allows for restructuring of the data by, for example, region or category. Hackolade integrated As you can see, Studio 3T is fully capable of some complex migrations through its tooling, SQL Migration, Import and Reschema. However, if you are of the mindset that you should design your migration first, then this is where Hackolade comes in. Hackolade is a "polyglot data" modelling application designed for a world where models are implemented in everything from graphs to APIs to storage formats. Figure 2. Entity Relationships Mapped in Hackolade Using Hackolade, you can import the schema with relationships from an SQL database and then visually reassemble the SQL fields to compose your MongoDB document schema. Watch this demo video for more details. Once you've built your new models, you can use Studio 3T to put them into practice by importing a Hackolade schema mapping into a SQL Migration configuration. Figure 3. Importing Hackolade Models into Studio 3T This helps eliminate the need to repeatedly query and import from the SQL database as you fine-tune your migration. Instead, you can construct and document the migration within Hackolade, review with your team, and then implement with confidence. Once the data is in MongoDB Atlas, support from Studio 3T continues. Studio 3T's Reschema tools allow you to restructure your schema without re-importing, working entirely on the MongoDB Atlas data. This can be particularly useful when blending data from existing document databases with freshly imported, formerly relational data. The idea that migrations have to be team-breaking chores no longer holds. It is eminently possible, with tools such as Studio 3T's SQL Migration and Hackolade, to not only perform more complex migrations easily, but to be able to design them up front and put them into practice as often as is needed. With both tools working in integrated harmony, the future of migration has arrived.

August 24, 2021

How DataSwitch And MongoDB Atlas Can Help Modernize Your Legacy Workloads

Data modernization is here to stay, and DataSwitch and MongoDB are leading the way forward. Research strongly indicates that the future of the Database Management System (DBMS) market is in the cloud, and the ideal way to shift from an outdated, legacy DBMS to a modern, cloud-friendly data warehouse is through data modernization. There are a few key factors driving this shift. Increasingly, companies need to store and manage unstructured data in a cloud-enabled system, as opposed to a legacy DBMS which is only designed for structured data. Moreover, the amount of data generated by a business is increasing at a rate of 55% to 65% every year and the majority of it is unstructured. A modernized database that can improve data quality and availability provides tremendous benefits in performance, scalability, and cost optimization. It also provides a foundation for improving business value through informed decision-making. Additionally, cloud-enabled databases support greater agility so you can upgrade current applications and build new ones faster to meet customer demand. Gartner predicts that by 2022, 75% of all databases will be on the cloud – either by direct deployment or through data migration and modernization. But research shows that over 40% of migration projects fail. This is due to challenges such as: Inadequate knowledge of legacy applications and their data design Complexity of code and design from different legacy applications Lack of automation tools for transforming from legacy data processing to cloud-friendly data and processes It is essential to harness a strategic approach and choose the right partner for your data modernization journey. We’re here to help you do just that. Why MongoDB? MongoDB is built for modern application developers and for the cloud era. As a general purpose, document-based, distributed database, it facilitates high productivity and can handle huge volumes of data. The document database stores data in JSON-like documents and is built on a scale-out architecture that is optimal for any kind of developer who builds scalable applications through agile methodologies. Ultimately, MongoDB fosters business agility, scalability and innovation. Key MongoDB advantages include: Rich JSON Documents Powerful query language Multi-cloud data distribution Security of sensitive data Quick storage and retrieval of data Capacity for huge volumes of data and traffic Design supports greater developer productivity Extremely reliable for mission-critical workloads Architected for optimal performance and efficiency Key advantages of MongoDB Atlas , MongoDB’s hosted database as a service, include: Multi-cloud data distribution Secure for sensitive data Designed for developer productivity Reliable for mission critical workloads Built for optimal performance Managed for operational efficiency To be clear, JSON documents are the most productive way to work with data as they support nested objects and arrays as values. They also support schemas that are flexible and dynamic. MongoDB’s powerful query language enables sorting and filtering of any field, regardless of how nested it is in a document. Moreover, it provides support for aggregations as well as modern use cases including graph search, geo-based search and text search. Queries are in JSON and are easy to compose. MongoDB provides support for joins in queries. MongoDB supports two types of relationships with the ability to reference and embed. It has all the power of a relational database and much, much more. Companies of all sizes can use MongoDB as it successfully operates on a large and mature platform ecosystem. Developers enjoy a great user experience with the ability to provision MongoDB Atlas clusters and commence coding instantly. A global community of developers and consultants makes it easy to get the help you need, if and when you need it. In addition, MongoDB supports all major languages and provides enterprise-grade support. Why DataSwitch as a partner for MongoDB? Automated schema re-design, data migration & code conversion DataSwitch is a trusted partner for cost-effective, accelerated solutions for digital data transformation, migration and modernization through a modern database platform. Our no-code and low-code solutions along with cloud data expertise and unique, automated schema generation accelerates time to market. We provide end-to-end data, schema and process migration with automated replatforming and refactoring, thereby delivering: 50% faster time to market 60% reduction in total cost of delivery Assured quality with built-in best practices, guidelines and accuracy Data modernization: How “DataSwitch Migrate” helps you migrate from RDBMS to MongoDB DataSwitch Migrate (“DS Migrate”) is a no-code and low-code toolkit that leverages advanced automation to provide intuitive, predictive and self-serviceable schema redesign from a traditional RDBMS model to MongoDB’s Document Model with built-in best practices. Based on data volume, performance, and criticality, DS Migrate automatically recommends the appropriate ETTL (Extract, Transfer, Transform & Load) data migration process. DataSwitch delivers data engineering solutions and transformations in half the timeframe of the existing typical data modernization solutions. Consider these key areas: Schema redesign – construct a new framework for data management. DS Migrate provides automated data migration and transformation based on your redesigned schema, as well as no-touch code conversion from legacy data scripts to MongoDB Atlas APIs. Users can simply drag and drop the schema for redesign and the platform converts it to a document-based JSON structure by applying MongoDB modeling best practices. The platform then automatically migrates data to the new, re-designed JSON structure. It also converts the legacy database script for MongoDB. This automated, user-friendly data migration is faster than anything you’ve ever seen. Here’s a look at how the schema designer works. Refactoring – change the data structure to match the new schema. DS Migrate handles this through auto code generation for migrating the data. This is far beyond a mere lift and shift. DataSwitch takes care of refactoring and replatforming (moving from the legacy platform to MongoDB) automatically. It is a game-changing unique capability to perform all these tasks within a single platform. Security – mask and tokenize data while moving the data from on-premise to the cloud. As the data is moving to a potentially public cloud, you must keep it secure. DataSwitch’s tool has the capability to configure and apply security measures automatically while migrating the data. Data Quality – ensure that data is clean, complete, trustworthy, consistent. DataSwitch allows you to configure your own quality rules and automatically apply them during data migration. In summary: first, the DataSwitch tool automatically extracts the data from an existing database, like Oracle. It then exports the data and stores it locally before zipping and transferring it to the cloud. Next, DataSwitch transforms the data by altering the data structure to match the re-designed schema, and applying data security measures during the transform step. Lastly, DS Migrate loads the data and processes it into MongoDB in its entirety. Process Conversion Process conversion, where scripts and process logic are migrated from legacy DBMS to a modern DBMS, is made easier thanks to a high degree of automation. Minimal coding and manual intervention are required and the journey is accelerated. It involves: DML – Data Manipulation Language CRUD – typical application functionality (Create, Read, Update & Delete) Converting to the equivalent of MongoDB Atlas API Degree of automation DataSwitch provides during Migration Schema Migration Activities DS Automation Capabilities Application Data Usage Analysis 70% 3NF to NoSQL Schema Recommendation 60% Schema Re-Design Self Services 50% Predictive Data Mapping 60% Process Migration Activities DS Automation Capabilities CRUD based SQL conversion (Oracle, MySQL, SQLServer, Teradata, DB2) to MongoDB API 70% Data Migration Activities DS Automation Capabilities Migration Script Creation 90% Historical Data Migration 90% 2 Catch Load 90% DataSwitch Legacy Modernization as a Service (LMaas): Our consulting expertise combined with the DS Migrate tool allows us to harness the power of the cloud for data transformation of RDBMS legacy data systems to MongoDB. Our solution delivers legacy transformation in half the time frame through pay-per-usage. Key strengths include: ● Data Architecture Consulting ● Data Modernization Assessment and Migration Strategy ● Specialized Modernization Services DS Migrate Architecture Diagram Contact us to learn more.

May 13, 2021

Accelerate Data Modernization with Infosys Data Model Converter

Are you in the process of migrating applications from a relational database to MongoDB? If so, you’re likely trying to best understand and decide how your enterprise data needs to be modeled. Our previous blog discussed how Infosys Data Services Suite can help enterprises move data seamlessly from legacy relational databases to MongoDB. But moving data is only one part of the puzzle. The more significant step is choosing the target data model, or schema design, a process that usually requires several hours of highly skilled talent. That’s why we created this follow-up blog to help you get started. Rethinking Schema Design Ultimately, schema design can be the difference between an inefficient, disorganized database and a strategic one that empowers the entire company. Schema design in MongoDB requires a change in perspective for data architects, developers, and database administrators. They have to: Rethink the legacy relational data model. This model flattens data into rigid two-dimensional tabular structures of rows and columns. The new data model is a rich and dynamic one with embedded sub-documents and arrays Rethink how the data platform works. In relational databases, it is extremely difficult to change the data platform as the application evolves. However, in MongoDB, the apps and APIs come first and the data platform dynamically accommodates the data Getting Schema Design Right Begin the schema design process by considering the application’s requirements. You’ll want to model the data in a way that leverages the flexibility of the document model. In schema migrations, it may seem easy at first to simply mirror the flat schema of the relational database in the document model. However, this negates the advantages enabled by the rich and embedded data structures of the document model. For example, data that belongs to a parent-child relationship in two RDBMS tables can be collapsed (embedded) into a single document in MongoDB. The application data access patterns should also drive schema design with a specific focus on: The read/write ratio of database operations and whether it is more important to optimize the performance of one operation over another The types of queries and updates performed by the databases The lifecycle of the data and growth rate of documents Simplifying Schema Design with Infosys Data Model Converter Infosys has developed a solution called Infosys Data Model Convertor that processes source relational schema and the above-mentioned signals as inputs and automatically provides target MongoDB schema suggestions. Infosys Data Model Converter is available as part of Infosys Modernization Suite which accelerates enterprises’ modernization journey. Each schema suggestion is accompanied by a detailed analysis report. The data modeler can use this as a starting point and iterate over the schema to arrive at the final MongoDB schema. The Infosys Data Model Converter reduces 50-60% of the effort typically spent in schema design. Key Features Boosts productivity by augmenting the migration of RDBMS to NoSQL database Saves time by automatically extracting schema, query and data patterns from an existing RDBMS Comprehensively analyzes the RDBMS entity relations, data and read-and-write patterns Applies a rich set of rules and generates a fully compliant NoSQL target state data model Offers flexibility by externalizing the rules for organization-specific customizations Connects and deploys the model to the target NoSQL platform with sample data Discover more ways in which Infosys can help you unlock value from modernization. Contact us for any modernization questions.

April 15, 2021