ANNOUNCEMENTVoyage AI joins MongoDB to power more accurate and trustworthy AI applications on Atlas. Learn more >
NEWSLearn why MongoDB was named a leader in the 2024 Gartner® Magic Quadrant™ Read the report >
NEWMongoDB 8.0: Experience unmatched speed and performance. Check it out >

SOLUTIONS

AI-Powered Healthcare with MongoDB & Microsoft

MongoDB and Microsoft deliver AI-powered solutions for breast cancer care, unifying data and enabling predictive modeling, intelligent chatbots, and analytics.
Start FreeView the demo
An image of an AI healthcare processing
Solution Overview

MongoDB Atlas and Microsoft AI technologies converge in an innovative healthcare solution called "Leafy Hospital," showcasing how cutting-edge technology can transform breast cancer diagnosis and patient care. This integrated system leverages MongoDB's flexible data platform for unifying operational, metadata, and AI data, while incorporating Microsoft's advanced capabilities including Azure OpenAI, Microsoft Fabric, and Power BI to create a comprehensive healthcare analytics and diagnostic solution. The solution demonstrates three key technological approaches:

  • Predictive AI for early detection using deep learning models to analyze mammograms and predict BI-RADS scores
  • Generative AI for workflow automation, featuring vector search capabilities and RAG-based chatbots for intelligent information retrieval
  • Advanced analytics that combines real-time operational insights with long-term trend analysis through Power BI integration

This approach enables healthcare providers to streamline diagnostic processes, automate clinical documentation, and make data-driven decisions while ensuring secure handling of sensitive patient information.

This is an image

Reference Architectures

The reference architecture illustrates how the Leafy Hospital solution integrates various components across three main technological areas:

  1. Predictive AI layer (bottom yellow box):
    • Fabric Data Science processes mammogram images and clinical data
    • Handles BI-RADS (breast imaging-reporting and data system) scoring and biopsy type analysis
    • Determines malignant or benign classification
    • Receives images from Azure Blob Storage
    • Outputs operational data to MongoDB Atlas
  2. Generative AI layer (middle purple box):
    • Azure AI Studio integrates with MongoDB Atlas on Azure
    • Enables automated report generation for clinical documentation
    • Features a chatbot for question-answering capabilities
    • Processes operational and vector data from MongoDB Atlas
    • Facilitates natural language interactions with the system
  3. Advanced analytics layer (middle green box):
    • Combines Fabric Power BI and Fabric OneLake
    • Generates reports and dashboards from processed data
    • Integrates with MongoDB Atlas for data visualization
    • Provides comprehensive analytics capabilities

The data flow begins with medical images stored in Azure Blob Storage, which are then processed through the various layers:

  • Images and operational data flow through Fabric Data Science for AI processing
  • Results are stored in MongoDB Atlas, which serves as the central operational database
  • Azure AI Studio handles generative AI tasks using the stored data
  • Finally, Fabric Power BI and OneLake enable advanced analytics and visualization
Leafy Hospital Solution Architecture
Figure 1: Leafy Hospital Solution Architecture

This architecture ensures a seamless flow of information from raw medical data to actionable insights while maintaining security and performance throughout the system.

Building the solution

The Leafy Hospital demo showcases the integration of MongoDB Atlas with Microsoft's AI and analytics services through several key components:

Data architecture and flow

The solution's data architecture supports both operational and analytical workloads efficiently. MongoDB Atlas serves as the operational datastore for real-time AI applications, while Microsoft OneLake handles analytics for long-term trend analysis. This dual architecture enables:

  • Real-time processing of patient data and medical imaging
  • Seamless integration between operational and analytical systems
  • Efficient data flow from transactional to analytical processing
  • Support for both millisecond-response operational queries and complex analytical workloads
Real-Time to Analytics Data Pipeline
Figure 2: Real-Time to Analytics Data Pipeline
Predictive AI for early detection

Predictive AI is critical in healthcare as it aids in accurate diagnosis, relying on predictions from large datasets compared with manual analysis, which is likely to bring in manual errors. Microsoft Fabric Data Science presents a robust platform to train and experiment with ML Models and manage MLOps cycles.

Two models were trained and used to:

  1. BI-RADS prediction: BI-RADS is an industry standard mechanism to describe mammogram findings and is classified in seven categories with a score of possibility of a malignant cancer increasing with the score value from 0 to 6. VGG16 is a deep convolutional neural network (CNN) model. It is trained on mammogram images from the dataset on Kaggle, which were grouped in folders as per their BI-RADS. Image analysis needs deep neural network models and the best model needs to be selected based on training on actual datasets running into multiple epochs.

    Fabric Data Science is used to train the models, run experiments, and manage the multiple versions. Multiple experiments were run with the two algorithms VGG16 and EfficientNetV2L, and the easy comparison of the multiple ML parameters and metrics for each version helps in the selection process of the final model. The images for training are directly uploaded to the Lakehouse in OneLake from the user's local machine using the UI itself. Additionally, the images stored in Azure blob storage can be easily referenced in the notebook downloading them from the blob URL using wget/curl, referencing using shortcuts, or even using a data pipeline. The image metadata and final prediction are stored in MongoDB Atlas.

  2. Biopsy classification: For the use case of binary classification of the cancer as malignant or benign, classification or regression models can be used. Random forest classifier model is trained on a dataset from Kaggle, with nine input parameters such as clump thickness, uniformity of cell size and shape, bare nuclei, mitoses, etc. Based on the values of these parameters the model is able to predict if the cancer is malignant or benign. In production use cases, more parameters can be added and the model can be trained from their values to be able to predict with more accuracy. Random forest model gave an accuracy of more than 97% and thus was ideal for this use case. The training dataset is fetched from MongoDB Atlas and prediction output is updated back to MongoDB, thanks to the MongoDB Spark Connector.

    Fabric Data Science makes the training and managing the end-to-end ML lifecycle easy and intuitive. Fabric Data Science manages the lifecycle by auto logging related parameters for each experiment and model using the de-facto data science standards of MLflow.

Vector search implementation

Vector search capabilities form the foundation of the solution's intelligent querying system, implemented in three key stages:

  1. Data preparation:
    • Clinical notes are processed using Azure OpenAI's text-embedding-ada-002 model
    • Data is converted into vector embeddings for high-dimensional space representation
    • Vector embeddings are stored in MongoDB Atlas with optimized search indexes
  2. Query processing:
    • Natural language queries are converted to vector representations
    • Semantic understanding enables complex medical queries
    • Query vectors are matched against stored embeddings
  3. Document retrieval:
    • Atlas Search executes similarity-based searches
    • Returns relevant medical records based on semantic matching
    • Enables intuitive access to patient information
Vector Search Implementation Process Flow
Figure 3: Vector Search Implementation Process Flow
RAG-based chatbot architecture

The chatbot implementation leverages retrieval augmented generation (RAG) architecture with three distinct data contexts:

  1. Patient information retrieval:
    • Executes queries to fetch current patient details
    • Retrieves structured patient data from MongoDB collections
    • Provides immediate access to critical patient information
  2. Historical data processing:
    • Accesses 10-year patient history from MongoDB Atlas
    • Decodes and summarizes historical data through Azure OpenAI LLM
    • Implements thought chaining for context-aware responses
  3. Medical knowledge integration:
    • Utilizes vectorized medical documentation
    • Performs real-time vector searches based on query context
    • Integrates relevant medical literature and case studies
Blueprint for the Chatbot architecture
Figure 4: Blueprint for the Chatbot architecture
Analytics and visualization

The solution leverages two complementary visualization platforms for comprehensive analytics: MongoDB Atlas Charts provides native, real-time operational dashboards directly connected to MongoDB data. It enables immediate insights into critical healthcare metrics through intuitive visualizations without requiring data transformations or additional tools. The operational dashboard (Figure 5) demonstrates key metrics including patient numbers, appointment status, and clinic distribution.

Atlas Charts
Figure 5: Atlas Charts
Power BI integration extends the analytics capabilities by enabling enterprise-wide data analysis and advanced visualizations. Through the MongoDB Atlas connector, healthcare data can be combined with other enterprise sources in Microsoft OneLake. The geographical visualization dashboard (Figure 6) showcases this integration, displaying patient distribution and enabling sophisticated analytical capabilities.
PowerBI integration with MongoDB Atlas
Figure 6: PowerBI integration with MongoDB Atlas

Together, these platforms provide a complete analytics solution that handles both immediate operational needs and long-term analytical requirements.

The solution demonstrates how MongoDB Atlas serves as a unified platform that handles operational data, vector search capabilities, and analytics requirements while seamlessly integrating with Microsoft's AI and visualization tools. This architecture enables healthcare providers to leverage both real-time operational insights and long-term analytical capabilities within a single, coherent system.

For a detailed, step-by-step guide on implementing this solution, including code samples and specific configuration instructions, visit our GitHub repository.

Key Learnings
  • Unified data platform: MongoDB Atlas serves as a central repository that effectively unifies operational data, metadata, and AI data, enabling seamless integration between different components of the healthcare system.

  • AI integration capabilities: The architecture demonstrates how different types of AI (Predictive, Generative, and Analytics) can be effectively integrated into a single healthcare solution using Microsoft's AI services and MongoDB Atlas.

  • Workflow automation: The solution shows how AI can automate critical healthcare workflows, from diagnostic predictions to report generation and intelligent querying through chatbots, reducing manual effort and potential errors.

  • Scalable analytics: The combination of MongoDB Atlas with Microsoft Fabric and Power BI enables both real-time operational analytics and long-term trend analysis, providing comprehensive insights for healthcare decision-making.

  • Secure healthcare architecture: The solution exemplifies how to build a modern healthcare system that maintains data security and privacy while enabling advanced AI capabilities and data analytics.

Technologies and Products Used
MongoDB developer data platform:
Partner technologies:
Authors
  • Francesc Mateu, MongoDB
  • Diana Annie Jenosh, MongoDB
  • Sebastian Rojas Arbulu, MongoDB
Related resources
industry_enterprise

GitHub Repository: Leafy-Hospital

Create this demo for yourself by following the instructions in this solution’s repository.

general_action_read

AI-Powered Innovation in Healthcare and Life Sciences

Discover how AI transforms healthcare with MongoDB Atlas—from clinical decisions to drug discovery.

industry_healthcare

MongoDB for Healthcare

Learn how MongoDB’s developer data platform supports a wide range of use cases in the healthcare industry.

general_content_ebook

E-book: Innovate with AI

Discover how leading industries are transforming with AI and MongoDB Atlas.

Get started with Atlas today

Get started in seconds. Our free clusters come with 512 MB of storage so you can experiment with sample data and get familiar with our platform.
Try FreeContact sales
Illustration of hands typing on a laptop in the foreground and a superimposed desktop window and coffee cup in the background.