Near Real-time Analytics Powered by Mirroring in Microsoft Fabric for MongoDB Atlas

DJ
Diana Annie Jenosh4 min read • Published Nov 19, 2024 • Updated Dec 10, 2024
AzureAtlas
Facebook Icontwitter iconlinkedin icon
MongoDB is excited to present a mirroring solution for our customers who are looking to bring operational data from MongoDB Atlas to Microsoft Fabric for big data analytics, AI, and BI, combining it with the rest of the data estate of the enterprise. Mirroring in Fabric provides a unique way to bring in data from operational data stores into the uniform data layer of OneLake in Fabric. Open Mirroring is designed to enable Data ISVs to extend it because it is based on an open Delta Lake table format. In addition, Database Mirroring provides public APIs for Data ISVs to integrate with the open mirroring approach. Once mirroring is enabled for a MongoDB Atlas collection, the corresponding table in OneLake is kept in sync with the changes in source MongoDB Atlas collection, unlocking opportunities of varied analytics and AI and BI in near real-time. To learn more about the Open Mirroring capabilities, please visit Introducing Open Mirroring in Microsoft Fabric.

MongoDB Atlas in Fabric ecosystem

Today, there are multiple ways data from MongoDB Atlas can be brought into OneLake in Microsoft Fabric. Data pipelines and Dataflow Gen2 are easy mechanisms to bring data from MongoDB Atlas to OneLake in batches or micro-batches. The Data Pipeline connector for MongoDB Atlas provides an easy experience via the Copy data assistant which allows users to choose MongoDB Atlas as a source and OneLake as a target to push the operational data to OneLake. It also supports MongoDB Atlas as a target to ingest the analytics and enriched data back to MongoDB Atlas for persistence. Dataflow Gen2 is an Atlas SQL-based connector and allows you to pull data from MongoDB Atlas to OneLake, do transformations such as filtering and flattening of data, and run power queries to create rich Power BI visualizations. Additionally, the MongoDB Spark connector can be used to create Fabric Spark notebooks that pull/push data from MongoDB Atlas in batch and streaming mode, as well. Data Pipeline Copy activity, Dataflow Gen2, and Spark notebooks can be orchestrated by Data Pipeline to build enterprise workflows with MongoDB as a key source and destination.

Mirroring MongoDB Atlas to Fabric OneLake

Joint customers will gain value from Mirroring, which will enable them with Spark-based analytics, SQL-based warehousing capabilities, SynapseML-based AI/ML predictions, and KQL-based, real-time intelligence on the current and up-to-date data in MongoDB Atlas.
The insert, update, and delete events occurring on the source MongoDB Atlas collection will be mirrored to the target OneLake table in near real-time. The conversion of the format to Parquet, data type conversions, and handling schema changes is all handled by mirroring. Mirroring generates a SQL analytics endpoint to enable SQL analytics using T-SQL on the mirrored data tables in a read-only fashion. It also generates a default semantic model to build Microsoft Power BI reports and dashboards out of the mirrored OneLake tables.
The following diagram depicts the mirroring integration architecture.
MongoDB Atlas mirroring in Fabric

MongoDB Atlas Mirroring implementation

Our implementation of mirroring utilizes the mirroring open extensibility platform in Fabric and is intended to benefit customers who have been looking to achieve this functionality since it was pre-announced at Microsoft Ignite two years back. The mirroring open extensibility platform, announced at Ignite this year, provides a set of APIs to be used to create a MirrorDB. Once a Landing Zone is created within the MirrorDB in OneLake, then all the Parquet files pushed to the Landing Zone will be replicated into the corresponding MirrorDB tables.
Microsoft has published a set of APIs and formats for any third-party developer, partner, customer, or ISV to push change data to the Landing Zone.
The solution requires a one-time execution of a simple Python script that invokes the Microsoft Fabric mirroring APIs from the mirroring open extensibility platform to create the MirrorDB for MongoDB Atlas.
Once the Landing Zone in MirrorDB is created, you click the Deploy to Azure button, available at the bottom in the provided GitHub repo, as shown below.
MongoDB Fabric Replication
It will take you to an ARM template configuration screen in your Azure tenant. Once the required details are input and the Create button is selected, it will create an App service, deploy the Python app, and start the mirroring application. The mirroring application will trigger Initial Sync for the one-time historical data ingestion into OneLake and Listening which then keeps replicating changes in the source MongoDB Atlas collections to the target tables in the MirrorDB.
The solution also can also be deployed in a VM in your Azure tenant or any bare metal server. If deployed in a VM, the Azure Vnet of the VM can be peered or private linked with MongoDB Atlas for secure communication.
The following diagram depicts how the app service powered by Python script enables mirroring.
MongoDB Atlas mirroring Solution
Refer to the short video for the MongoDB Atlas to OneLake mirroring solution explanation.

MongoDB Atlas mirroring use cases

Mirroring will benefit multiple industries and their use cases where near real-time analytics are required. The mirrored MongoDB database in OneLake comes with a SQL analytics endpoint which simplifies the data warehousing experience by using T-SQL or the visual query editor. This can be used to combine data from other lakehouses and warehouses in Fabric for a holistic enterprise data-estate analysis. The semantic model can help build reports and dashboards using Power BI and the mirrored data. These reports and outputs can help take responsive actions in near real-time, especially in highly regulated industries.
The mirrored data can be used in near real-time, ML-based analytics like credit scoring, fraud detection in the financial services industry, for dynamic pricing, inventory management in retail industry, or for predictive maintenance in the manufacturing industry.

Conclusion

Mirroring for MongoDB Atlas will enable you to move your data from MongoDB Atlas collections to OneLake without any complex setup or ETL process involved. This near real-time data synchronization into the Delta format allows the mirrored data to be utilized by the variety of Fabric tools to gather accurate and meaningful insights.
Get started with mirroring for your MongoDB Atlas data.
Top Comments in Forums
Forum Commenter Avatar
Mark_ZMark Z3 quarters ago

can you provide the format of the mongodb atlas connection string which needs to be used for mongo_conn_str

See More on Forums

Facebook Icontwitter iconlinkedin icon
Rate this announcement
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Tutorial

How to Evaluate Your LLM Application


Jun 24, 2024 | 20 min read
Code Example

MergeURL - Python Example App


Jul 07, 2022 | 3 min read
Tutorial

Exploring Search Capabilities With Atlas Search


Aug 20, 2024 | 9 min read
Podcast

MongoDB Atlas Multicloud Clusters


May 16, 2022 | 25 min
Table of Contents