Unlock the Value of Data in MongoDB Atlas with the Intelligent Analytics of Microsoft Fabric
Rate this announcement
To win in this competitive digital economy, enterprises are striving to create smarter intelligent apps. These apps provide a superior customer experience and can derive insights and predictions in real-time.
Smarter apps use data — in fact, lots of data, AI and analytics together. MongoDB Atlas stores valuable operational data and has capabilities to support operational analytics and AI based applications. This blog details MongoDB Atlas’ seamless integration with Microsoft Fabric to run large scale AI/ML and varied analytics and BI reports across the enterprise data estate, reshaping how teams work with data by bringing everyone together on a single, AI-powered platform built for the era of AI. Customers can leverage MongoDB Atlas with Microsoft Fabric as the foundation to build smart and intelligent applications.
MongoDB was showcased as a key partner at Microsoft Ignite, highlighting the collaboration to build seamless integrations and joint solutions complementing capabilities to address diverse use cases.
During the first Keynote at Ignite, Satya Nadella, Chairman and Chief Executive Officer of Microsoft, announced that Microsoft Fabric is now generally available for purchase. Satya addressed the strategic plan to enable MongoDB Atlas mirroring in Microsoft Fabric to enable our customers to use mirroring to access their data in OneLake.
MongoDB Atlas’ flexible data model, versatile query engine, integration with LLM frameworks, and inbuilt Vector Search, analytical nodes, aggregation framework, Atlas Data Lake, Atlas Data Federation, Charts, etc. enables operational analytics and application-driven intelligence from the source of the data itself. However, the analytics and AI needs of an enterprise span across their data estate and require them to combine multiple data sources and run multiple types of analytics like big data, Spark, SQL, or KQL-based ones at a large-scale. They bring data from sources like MongoDB Atlas to one uniform format in OneLake in Microsoft Fabric to enable them to run Batch Spark analytics and AI/ML of petabyte scale and use data warehousing abilities, big data analytics, and real-time analytics across the delta tables populated from disparate sources.
Let's discuss the integration mechanisms between MongoDB Atlas and Microsoft Fabric for both batch and real-time scenarios.
MongoDB Atlas serves as the operational data layer (ODL) of many enterprise applications. Atlas as an ODL stores data from internal applications, customer-facing services, and third-party APIs from multiple channels. By using Microsoft Fabric pipelines, you can combine MongoDB Atlas data with relational data from other traditional applications and unstructured data from sources like logs, clickstreams, etc.
There are multiple options to bring MongoDB Atlas data into Microsoft Fabric in batch mode one time or bring it in micro batches which runs on a specified frequency. In this section, we will discuss the out-of-the-box (OOTB) approaches that can be applied to fetch data from MongoDB to Microsoft Fabric in batch mode.
The MongoDB Atlas SQL connector is a Microsoft-certified connector which can be accessed from the “Dataflow Gen2” feature from “Data Factory” in Microsoft Fabric.
Dataflow Gen2 selection takes us to the familiar Power Query interface of Microsoft Power BI. To bring data from MongoDB Atlas collections, search the MongoDB Atlas SQL connector from the “Get Data” option on the menu.
Providing the connection URI and authentication details, we can set up a connection to MongoDB Atlas. Note that the connection string is not that of a normal connection but that of Atlas SQL or a federated database. Learn more about connecting using the Atlas SQL or set up an Atlas federated database and get a connection string for the same. Also, note that the connector needs a Gateway set up to communicate from Fabric and schedule refreshes. Get more details on Gateway setup.
Once data is retrieved from MongoDB Atlas into Power Query, the magic of Power Query can be used to transform the data, including flattening object data into separate columns, unwinding array data into separate rows, or changing data types. These are typically required when converting MongoDB data in JSON format to the relational format in Power BI. Additionally, the blank query option can be used for a quick query execution. Below is a sample query to start with:
The announcement at Microsoft Ignite of the Data Pipeline connector being released for MongoDB Atlas in Microsoft Fabric is definitely good news for MongoDB customers. The connector provides a quick and similar experience as the MongoDB connector in Data Factory and Synapse Pipelines.
The connector is accessed from the “Data Pipelines” feature from “Data Factory” in Fabric. Choose the “Copy data” activity to use the MongoDB connector to get data from MongoDB or to push data to MongoDB. To get data from MongoDB, add MongoDB in Source. Select the MongoDB connector and create a linked service by providing the connection string and the database to connect to in MongoDB Atlas.
We can run the pipeline by setting up Lakehouse in Microsoft Fabric as the Sink to receive the data from MongoDB Atlas.
Thus, the connector provides an easy mechanism to copy data between MongoDB Atlas and Microsoft Fabric. Similar to the MongoDB connector in Microsoft Azure Data Factory, it is available for both on-prem and self-hosted MongoDB instances and MongoDB Atlas. As this is the first release of the connector, it has few limitations when compared to the MongoDB connector in Azure Data Factory or Synapse Pipelines.
As mentioned earlier, Satya Nadella, CEO of Microsoft, announced the strategic plan to enable MongoDB Atlas mirroring in Microsoft Fabric. This will provide the easiest and fastest approach to replicate the data from MongoDB Atlas to OneLake in Microsoft Fabric. However, this will provide the fewest options to do data transformations while replicating the data. The availability of this feature is expected to be coming in CY24.
Bringing data using the PULL mechanism of batch mode serves multiple use-cases, including the near real-time ones. However, in certain cases, we may need a real-time sync mechanism, which will replicate the data changes from MongoDB Atlas into Microsoft Fabric in real-time. This has to be a Change Data Capture method, a PUSH mechanism from MongoDB Atlas to Microsoft Fabric. In this section, we will detail two such custom approaches that can be leveraged for these CDC-based, real-time sync use cases.
Real-time sync from MongoDB Atlas to Microsoft Fabric can be achieved by using MongoDB Atlas Triggers to capture the change events in a MongoDB collection and using an Atlas function to trigger an Azure function. The Azure function can directly write to the Lake House in Microsoft Fabric or to ADLS Gen2 storage using ADLS Gen2 APIs. ADLS Gen2 storage accounts can be referenced in Microsoft Fabric using shortcuts, eliminating the need for an ETL process to move data from ADLS Gen2 to OneLake. Data in Microsoft Fabric can be accessed using the existing ADLS Gen2 APIs but there are some changes and constraints which can be referred to in the Microsoft Fabric documentation.
MongoDB’s Spark connector v10.1 provides streaming capabilities which allows structured streaming of changes from MongoDB or to MongoDB in both continuous and micro-batch modes. Using the connector, we just need a simple code that reads a stream of changes from the MongoDB collection and writes the stream to the Lakehouse in Microsoft Fabric or to ADLS Gen2 storage which can be referenced in Microsoft Fabric using shortcuts. MongoDB Atlas can be set up as a source for structured streaming by referring to the MongoDB documentation. Refer to the Microsoft Fabric documentation on setting up Lakehouse as Sink for structured streaming.
Watch a full demo of MongoDB Atlas integration with Microsoft Fabric:
AI is set to reshape businesses and provide innovative solutions across industries. Smart applications can be built with the right database solution combined with the right analytics and AI solution. These applications can thrive, delivering accurate, context-aware, dynamic, data-driven user experiences that meet the growing demands of today's fast-paced digital landscape. MongoDB Atlas data and its operational analytics, when combined with the entire enterprise data estate in Microsoft Fabric to run multiple analytics and reporting at a large scale, is a powerful combination to build such smart applications.