- Use cases: Analytics, Gen AI, Interoperability
- Industries: Healthcare
- Products and tools: MongoDB Atlas, Atlas Search, Atlas Data Federation, Atlas Charts
- Partners: Microsoft
MongoDB Atlas and Microsoft AI technologies converge in an innovative healthcare solution called "Leafy Hospital," showcasing how cutting-edge technology can transform breast cancer diagnosis and patient care. This integrated system leverages MongoDB's flexible data platform for unifying operational, metadata, and AI data, while incorporating Microsoft's advanced capabilities including Azure OpenAI, Microsoft Fabric, and Power BI to create a comprehensive healthcare analytics and diagnostic solution. The solution demonstrates three key technological approaches:
This approach enables healthcare providers to streamline diagnostic processes, automate clinical documentation, and make data-driven decisions while ensuring secure handling of sensitive patient information.
The reference architecture illustrates how the Leafy Hospital solution integrates various components across three main technological areas:
The data flow begins with medical images stored in Azure Blob Storage, which are then processed through the various layers:
This architecture ensures a seamless flow of information from raw medical data to actionable insights while maintaining security and performance throughout the system.
The Leafy Hospital demo showcases the integration of MongoDB Atlas with Microsoft's AI and analytics services through several key components:
The solution's data architecture supports both operational and analytical workloads efficiently. MongoDB Atlas serves as the operational datastore for real-time AI applications, while Microsoft OneLake handles analytics for long-term trend analysis. This dual architecture enables:
Predictive AI is critical in healthcare as it aids in accurate diagnosis, relying on predictions from large datasets compared with manual analysis, which is likely to bring in manual errors. Microsoft Fabric Data Science presents a robust platform to train and experiment with ML Models and manage MLOps cycles.
Two models were trained and used to:
BI-RADS prediction: BI-RADS is an industry standard mechanism to describe mammogram findings and is classified in seven categories with a score of possibility of a malignant cancer increasing with the score value from 0 to 6. VGG16 is a deep convolutional neural network (CNN) model. It is trained on mammogram images from the dataset on Kaggle, which were grouped in folders as per their BI-RADS. Image analysis needs deep neural network models and the best model needs to be selected based on training on actual datasets running into multiple epochs.
Fabric Data Science is used to train the models, run experiments, and manage the multiple versions. Multiple experiments were run with the two algorithms VGG16 and EfficientNetV2L, and the easy comparison of the multiple ML parameters and metrics for each version helps in the selection process of the final model. The images for training are directly uploaded to the Lakehouse in OneLake from the user's local machine using the UI itself. Additionally, the images stored in Azure blob storage can be easily referenced in the notebook downloading them from the blob URL using wget/curl, referencing using shortcuts, or even using a data pipeline. The image metadata and final prediction are stored in MongoDB Atlas.
Biopsy classification: For the use case of binary classification of the cancer as malignant or benign, classification or regression models can be used. Random forest classifier model is trained on a dataset from Kaggle, with nine input parameters such as clump thickness, uniformity of cell size and shape, bare nuclei, mitoses, etc. Based on the values of these parameters the model is able to predict if the cancer is malignant or benign. In production use cases, more parameters can be added and the model can be trained from their values to be able to predict with more accuracy. Random forest model gave an accuracy of more than 97% and thus was ideal for this use case. The training dataset is fetched from MongoDB Atlas and prediction output is updated back to MongoDB, thanks to the MongoDB Spark Connector.
Fabric Data Science makes the training and managing the end-to-end ML lifecycle easy and intuitive. Fabric Data Science manages the lifecycle by auto logging related parameters for each experiment and model using the de-facto data science standards of MLflow.
Vector search capabilities form the foundation of the solution's intelligent querying system, implemented in three key stages:
The chatbot implementation leverages retrieval augmented generation (RAG) architecture with three distinct data contexts:
The solution leverages two complementary visualization platforms for comprehensive analytics: MongoDB Atlas Charts provides native, real-time operational dashboards directly connected to MongoDB data. It enables immediate insights into critical healthcare metrics through intuitive visualizations without requiring data transformations or additional tools. The operational dashboard (Figure 5) demonstrates key metrics including patient numbers, appointment status, and clinic distribution.
Together, these platforms provide a complete analytics solution that handles both immediate operational needs and long-term analytical requirements.
The solution demonstrates how MongoDB Atlas serves as a unified platform that handles operational data, vector search capabilities, and analytics requirements while seamlessly integrating with Microsoft's AI and visualization tools. This architecture enables healthcare providers to leverage both real-time operational insights and long-term analytical capabilities within a single, coherent system.
For a detailed, step-by-step guide on implementing this solution, including code samples and specific configuration instructions, visit our GitHub repository.
Unified data platform: MongoDB Atlas serves as a central repository that effectively unifies operational data, metadata, and AI data, enabling seamless integration between different components of the healthcare system.
AI integration capabilities: The architecture demonstrates how different types of AI (Predictive, Generative, and Analytics) can be effectively integrated into a single healthcare solution using Microsoft's AI services and MongoDB Atlas.
Workflow automation: The solution shows how AI can automate critical healthcare workflows, from diagnostic predictions to report generation and intelligent querying through chatbots, reducing manual effort and potential errors.
Scalable analytics: The combination of MongoDB Atlas with Microsoft Fabric and Power BI enables both real-time operational analytics and long-term trend analysis, providing comprehensive insights for healthcare decision-making.
Secure healthcare architecture: The solution exemplifies how to build a modern healthcare system that maintains data security and privacy while enabling advanced AI capabilities and data analytics.
Create this demo for yourself by following the instructions in this solution’s repository.
Discover how AI transforms healthcare with MongoDB Atlas—from clinical decisions to drug discovery.
Learn how MongoDB’s developer data platform supports a wide range of use cases in the healthcare industry.
Discover how leading industries are transforming with AI and MongoDB Atlas.