MongoDB and Microsoft deliver AI-powered solutions for breast cancer care, unifying data and enabling predictive modeling, intelligent chatbots, and analytics.
Use cases: Analytics, Gen AI, Interoperability
Industries: Healthcare
Products and tools: MongoDB Atlas, MongoDB Atlas Search, MongoDB Atlas Vector Search, MongoDB Atlas Data Federation, MongoDB Atlas Charts
Partners: Microsoft
Solution Overview
This solution, called "Leafy Hospital," integrates MongoDB Atlas and Microsoft AI to improve breast cancer diagnoses and patient care. This system leverages MongoDB's flexible data platform to unify operational metadata and AI data, and combines it with Microsoft products such as Azure OpenAI, Microsoft Fabric, and Power BI to create a comprehensive healthcare analytics and diagnostics solution. The solution demonstrates three key technological approaches:
Predictive AI for early detection: Use deep learning models to analyze mammograms and predict breast imaging-reporting and data system (BI-RADS) scores.
Generative AI for workflow automation: Use Vector Search capabilities and chatbots powered by Retrieval-Augmented Generation (RAG) for intelligent information retrieval.
Advanced analytics: Combine real-time operational insights with long-term trend analysis through Power BI integration.
This solution enables healthcare providers to streamline diagnostic processes, automate clinical documentation, and make data-driven decisions while ensuring secure handling of sensitive patient information.
Reference Architectures
The following diagram illustrates how the Leafy Hospital Solution integrates various components:
Figure 1. Leafy Hospital solution architecture
This solution integrates components across three main technological areas:
Predictive AI layer (bottom yellow box)
Processes mammogram images and clinical data.
Handles BI-RADS scoring and biopsy type analysis.
Determines malignant or benign classification.
Receives images from Azure Blob Storage.
Outputs operational data to MongoDB Atlas.
Generative AI layer (middle purple box)
Integrates with MongoDB Atlas using Azure AI Studio.
Enables automated report generation for clinical documentation.
Features a chatbot for question-answering capabilities.
Processes operational and vector data from Atlas.
Facilitates natural language interactions with the system.
Advanced analytics layer (middle green box)
Combines Fabric Power BI and Fabric OneLake.
Generates reports and dashboards from processed data.
Integrates with MongoDB Atlas for data visualization.
Provides comprehensive analytics capabilities.
Medical images that are first stored in Azure Blob Storage are then processed through the various layers:
Images and operational data flow through Fabric Data Science for AI processing.
Results are stored in MongoDB Atlas, which serves as the central operational database.
Azure AI Studio handles generative AI tasks using the stored data.
Finally, Fabric Power BI and OneLake enable advanced analytics and visualization.
This architecture ensures a seamless flow of information from raw medical data to actionable insights while maintaining security and performance throughout the system.
Build the Solution
The Leafy Hospital demo showcases the integration of MongoDB Atlas with Microsoft's AI and analytics services through several key components, which are described in the following sections.
For a detailed, step-by-step guide on implementing this solution, including code samples and specific configuration instructions, visit this solution's GitHub repository.
Data Architecture and Flow
In this solution, MongoDB Atlas serves as the operational datastore for real-time AI applications, while Microsoft OneLake handles analytics for long-term trend analysis. This architecture enables the following features:
Real-time processing of patient data and medical imaging.
Integration between operational and analytical systems.
Efficient data flow from transactional to analytical processing.
Support for both millisecond-response operational queries and complex analytical workloads.
Figure 2: Real-time to analytics data pipeline
Predictive AI for Early Detection
Predictive AI can be used in healthcare to generate an accurate diagnosis from large datasets. Microsoft Fabric Data Science presents a robust platform to train and experiment with ML Models and manage MLOps cycles. This solution uses models for the following purposes:
BI-RADS prediction
BI-RADS is an industry-standard mechanism to analyze mammogram findings. Healthcare specialists use BI-RADS to describe results from breast-imaging tests with a number from 0 to 6, with the possibility of malignancy increasing with the score.
This solution uses the VGG16 deep convolutional neural network (CNN) to predict BI-RADS scores from images. The model is trained on mammogram images from a Kaggle dataset. Each images is grouped into folders that correspond to their BI-RADS.
Fabric Data Science analyzes the performance of multiple models for this task and selects the best one. It trains the models, runs experiments, and manages multiple versions. The training images are directly uploaded to the Lakehouse in OneLake from the user's local machine by using the Lakehouse UI. Additionally, you can easily reference the images stored in Azure Blob Storage by using the
wget
orcurl
commands, using shortcuts, or by using a data pipeline. The solution stores the image metadata and final predictions in MongoDB Atlas.Biopsy classification
Classification or regression models can be used to classify tumors as malignant or benign. The random forest classifier model is trained on a Kaggle dataset that includes input parameters such as clump thickness, uniformity of cell size and shape, bare nuclei, and mitoses. The model can then predict whether a tumor is malignant or benign. In production, you can add more parameters to the dataset and train the model on these values to make more accurate predictions. During solution development, the random forest model had an accuracy rate of over 97%. The solution fetches the training dataset from MongoDB Atlas, and the prediction output is updated in MongoDB by using the MongoDB Spark Connector.
Fabric Data Science makes training and managing your models easy by automatically logging related parameters for each model and experiment.
Vector Search Implementation
This solution's intelligent query system relies on Vector Search, as shown in the following diagram.
Figure 3. Vector Search implementation process flow
Data preparation:
Azure OpenAI's
text-embedding-ada-002
model processes clinical notes.Data is converted into vector embeddings for high-dimensional space representation.
Vector embeddings are stored in MongoDB Atlas with optimized search indexes.
Query processing:
Natural language queries are converted into vector representations.
Semantic understanding enables complex medical queries.
Query vectors are matched against stored embeddings.
Document retrieval:
Returns relevant medical records based on semantic matching.
Enables intuitive access to patient information.
Atlas Vector Search executes similarity-based searches.
RAG-based Chatbot Architecture
The chatbot implementation leverages RAG architecture in the following contexts, as shown in the following diagram:
Figure 4. Blueprint for the chatbot architecture
Patient information retrieval:
Executes queries to fetch current patient details.
Retrieves structured patient data from MongoDB collections.
Provides immediate access to critical patient information.
Historical data processing:
Accesses 10-year patient history from MongoDB Atlas.
Decodes and summarizes historical data through Azure OpenAI LLM.
Implements thought chaining for context-aware responses.
Medical knowledge integration:
Uses vectorized medical documentation.
Performs real-time vector searches based on the query's context.
Integrates relevant medical literature and case studies.
Analytics and Visualization
This solution uses the following two visualization platforms for analytics.
First, MongoDB Atlas Charts provides native, real-time operational dashboards directly connected to MongoDB data. It enables immediate insights into critical healthcare metrics through intuitive visualizations without requiring data transformations or additional tools. The operational dashboard (Figure 5) demonstrates key metrics including patient numbers, appointment status, and clinic distribution.
Figure 5. Operational dashboard with Atlas Charts
Second, Power BI integration extends the analytics capabilities by enabling enterprise-wide data analysis and advanced visualizations. Through the MongoDB Atlas Connector, healthcare data can be combined with other enterprise sources in Microsoft OneLake. The geographical visualization dashboard (Figure 6) showcases this integration, displaying patient distribution and enabling sophisticated analytical capabilities.
Figure 6. PowerBI integration with MongoDB Atlas
Together, these platforms provide a complete analytics solution that handles both immediate operational needs and long-term analytical requirements.
This solution demonstrates how MongoDB Atlas can handle operational data, Vector Search capabilities, and analytics requirements while seamlessly integrating with Microsoft's AI and visualization tools. This architecture enables healthcare providers to leverage real-time operational insights and long-term analytical capabilities within a single system.
Key Learnings
Unified data platform: MongoDB Atlas serves as a central repository that unifies operational data, metadata, and AI data, enabling seamless integration between different components of the healthcare system.
AI integration capabilities: The architecture demonstrates how different types of AI, such as predictive, generative, and analytical, can be effectively integrated into a single healthcare solution using Microsoft's AI services and MongoDB Atlas.
Workflow automation: The solution shows how AI can automate critical healthcare workflows, from diagnostic predictions to report generation. It also enables intelligent querying through chatbots, reducing manual effort and potential errors.
Scalable analytics: The combination of MongoDB Atlas with Microsoft Fabric and Power BI enables both real-time operational analytics and long-term trend analysis, providing comprehensive insights for healthcare decision-making.
Secure healthcare architecture: The solution exemplifies how to build a modern healthcare system that maintains data security and privacy while enabling advanced AI capabilities and data analytics.
Authors
Francesc Mateu, MongoDB
Diana Annie Jenosh, MongoDB
Sebastian Rojas Arbulu, MongoDB