Use cases: Artificial Intelligence
Industries: Healthcare
Products: MongoDB Atlas
Solution Overview
Healthcare teams face a documentation crisis that threatens the quality of patient care. In specialized units like oncology, patients often arrive after years of care elsewhere, and have a long trail of scanned reports, PDFs, pathology narratives, imaging summaries, and lab results. These documents are often scattered across disconnected digital systems and still stored physically in paper copies, leaving each patient’s data fragmented and difficult to navigate.
To discuss individual patient cases, MDT meet either weekly or monthly. However, before each meeting, coordinators, nurses, or residents must reconstruct the story, extract key facts, resolve contradictions, normalize terminology, and build a coherent timeline. As a result, these clinicians end up spending many hours manually extracting critical data from unstructured medical documents just to put together a patient’s medical history.
This manual extraction slows down clinical decision making, introduces human error, and creates inconsistencies that can impact care across patient records. The result is delayed care, inefficient operations, and growing burnout among healthcare professionals who chose medicine to care for people, not to manage documents.
The solution described here automates that preparation by extracting specialty-specific entities using configurable templates, and by storing each extracted fact with provenance so clinicians can validate it quickly. MongoDB provides a flexible, auditable system of record for templates, extracted entities, and their links back to source documents.
The Solution: AI Powered Document Processing
The AI Powered Medical Report Generator transforms this operational challenge into a strategic advantage. Advanced artificial intelligence processes medical documents quickly, extracting specialized entities from many document formats, including PDFs, images, and clinical texts.
MongoDB's flexible document structure stores and organizes complex patient data extracted by the AI while maintaining relationships between medical entities. The system automatically generates complete MDT reports, reducing preparation time from hours to minutes while improving accuracy and consistency.
MongoDB also stores template versions and references links back to source documents, so every fact in the MDT draft can be validated.
Customizable MDTs through Entity Templates
Clinicians control exactly what information the system extracts through flexible template definitions. Medical teams define entity types, processing rules, and aggregation strategies that match their specific case requirements. For instance, while oncology teams configure templates to extract tumor markers and staging information, cardiac teams can focus on different diagnostic criteria, all within the same service.
Therefore, while this demo targets MDT reports, the template driven workflow can support any report type by updating the entity templates and formatting rules.
Each template specifies source document priorities, filtering rules, entity descriptions, extraction instructions, and output formatting to ensure MDT reports align with clinical preferred outcomes. This customization removes generic outputs and delivers reports designed specifically for each medical specialty's unique requirements.
Data Aggregation and Security
Medical teams can upload documents through an intuitive interface, while AI processes extract patient demographics, diagnoses, molecular abnormalities, and treatment histories in real time. However, to ensure data privacy and security, in this demo the upload functionality is disabled and processing runs exclusively on preset sample data.
The platform aggregates information across multiple sources, applies filtering rules, and produces structured reports that clinical teams can use to check their patient’s medical history. MongoDB's scalable architecture handles hundreds of simultaneous document processing requests while maintaining data integrity and security standards essential for healthcare environments.
This design delivers meaningful operational impact. The platform helps reduce bottlenecks in MDT workflows, supports faster treatment planning, and brings patient information together in an accessible way. In practice, this can help teams handle growing documentation demands, optimize resources, and spend more time on patient care.
Reference Architectures
Clinical teams begin their journey by selecting an existing patient from the demo's predefined dataset, which redirects them to the patient document overview page where all associated medical records are displayed in an organized interface. For security reasons, the demo restricts the addition of new documents or patients, but these functionalities can be activated by following configuration steps during the system setup.
From the patient overview page, clinicians can review existing medical documents already associated with that patient record. At this stage, documents exist as raw files stored in MongoDB without any processing or entity extraction. In the full production system, teams would upload new documents through intuitive drag-and-drop functionality, accepting any format including PDFs, medical images, text files, and XML documents.
Figure 1. Patient's documents overview
Before processing documents, clinicians can customize their extraction requirements by accessing Template Configuration through a menu located in the top right section of the interface. This configuration panel allows medical teams to define exactly which entity types, processing rules, and output formats the AI tools should extract. The system comes with a comprehensive library of medical entities including patient demographics, diagnoses, treatments, and molecular markers. Teams can select or create only the entities relevant to their specialties. For example, oncology teams can configure different extraction parameters than cardiology or traumatology teams, ensuring the system adapts to specific requirements.
Figure 2. Template definition process
With templates already configured, clinicians can manually trigger document processing for each patient document bundle they want to analyze. This action initiates the entity extraction phase where documents enter the Document Processor, which uses OCR to extract text from images and scanned materials. The system normalizes formatting inconsistencies and prepares content for intelligent analysis through an LLM provider, such as AWS Bedrock for this implementation. The AI extracts entities according to the pre-configured template specifications, and these extracted entities are then stored in MongoDB linked to their source documents.
When the documents have been processed and the entities extracted, clinicians can generate MDT reports. The MDT Report Generator operates as a separate function on the reports section of the interface. Rather than performing new entity extraction, the generator aggregates and organizes the entities that were previously extracted and stored during the document processing phase. The system combines entities from multiple processed documents for the selected patient, applies template formatting rules, and organizes data chronologically.
Figure 3. MDT report viewer
Once MDT reports are generated, clinical teams can validate AI performance with provided Ground Truth data for comparison in the Observability section. This evaluation functionality allows medical professionals to upload verified entity extractions as reference standards and compare them to the AI generated MDT reports. This evaluation process helps teams understand the AI's performance characteristics and refine their template configurations for optimal results in their specific clinical context.
Throughout this entire workflow, MongoDB maintains clear separation between raw documents, processed entities, and generated reports while enabling fast retrieval of each component.
Finally, the journey concludes with a comprehensive understanding of both the clinical insights extracted from patient documents and the performance metrics of the AI system itself. This approach ensures that teams not only receive valuable MDT reports but also gain confidence in the reliability and accuracy of the AI processing behind this MDT generation.
Figure 4. AI Powered Medical Report Generator Architecture Diagram
The Role of AI in Clinical Entity Extraction
The AI-driven entity extraction process represents the core intelligence that transforms unstructured medical documents into actionable clinical data. Advanced LLMs operate through cloud-based AI services as the primary extraction engine, following a sophisticated multi-step process, described in Figure 5, that combines template-driven instructions with domain-specific understanding.
Figure 5. AI Workflow Diagram
The extraction process begins when processed documents reach the Entity Extractor component, where the LLM model receives specific instructions through configurable templates. As explained in Figure 5, these templates function as detailed prompt engineering that directs the LLM's attention toward relevant medical concepts while applying specialty-specific processing logic.
Each entity definition specified in a template provides the LLM with precise extraction parameters. For example, the Diagnosis Date entity instructs the model to locate biopsy diagnosis dates while distinguishing them from symptom onset or treatment initiation dates. The LLM receives context about document types, expected formats, and clinical significance to ensure accurate identification and applies that context and constraints when processing the available documents.
The template system implements three distinct processing types that guide the LLM’s extraction behavior:
Match processing: This method directs the model to extract the initial reliable instance of definitive entities like patient demographics.
Aggregate processing: When selected, it instructs the LLM to collect all mentions of accumulating information like medication histories, then applies intelligent deduplication logic.
Source-filtered processing: If enabled, it guides the model to prioritize specific document types when extracting particular entities.
After the Template Definition step is finished and a new template is selected, the “LLM prompting” phase begins (see Figure 2). The platform constructs advanced queries that extract medical entities. Each query follows a standardized structure beginning with a concise system prompt:
You are an expert in extracting relevant data from documents. You will have to …
This system-level instruction establishes the LLM's role and expertise context before presenting the specific extraction task. The final query encapsulates document content and entity specifications within structured XML tags. The document section contains the processed medical text, while the entities section provides detailed extraction instructions for each target entity. This approach ensures the LLM receives clear and structured directives.
The extracted entities generated by the model become structured data that enables intelligent clinical synthesis for the creation of the end product, the MDT Report. The model's contextual understanding preserves clinical relationships between extracted concepts, allowing the platform to construct structured outputs from previously unstructured medical information.
This LLM-powered approach demonstrates how advanced language models can serve as effective tools in the clinical environment when properly guided through domain-specific templates and source-aware processing logic.
As a result, this solution allows medical teams to transform the traditional challenge of extracting meaning from thousands of clinical documents into an automated process that augments clinical expertise.
Build the Solution
For detailed setup instructions, follow the steps outlined in the
README of the Industry Solutions GitHub repository.
This repository hosts the backend and the frontend for the AI Powered
Medical Report Generator demo.
To reproduce this demo in your own environment, follow these steps:
Enable AWS Bedrock access
Set up your credentials for AWS Bedrock access:
Configure your AWS CLI or create IAM roles with Bedrock permissions for Claude 3 Haiku model access. If you work with SSO login, you can run the following command to configure this connection:
aws sso login Follow the instructions to log in.
Close the browser when you finish, then continue with the next steps.
Configure the environmental variables
Use the ./generate_env.sh file to generate your template
configurations.
Update the generated .env file with your specific configuration:
MONGODB_URI=mongodb+srv://<user>:<password>@<cluster-url>/?retryWrites=true&w=majority MONGODB_DB=your_database_name AWS_REGION=us-east-1 AWS_ACCESS_KEY_ID=your_access_key (if used) AWS_SECRET_ACCESS_KEY=your_secret_key (if used) SECRET_KEY=your_secure_secret_key LLM_PROVIDER=bedrock
Deploy both services with the Makefile
In two different terminals, execute the following commands to start the services independently:
Terminal 1: Backend
make backend
Terminal 2: Frontend
make frontend
After you complete these steps, you can use your fully functional AI-Driven Medical Assistant. You'll find all the resources at the following URLs:
Frontend: http://localhost:3000
API docs: http://localhost:8000/docs
Health check: http://localhost:8000/health
Key Learnings
Automate data extraction and reduce manual errors: Reliance on clinicians to manually extract information from PDFs, scans, and free-text reports makes processes inefficient, and increases errors from human mistakes. As patient histories grow in size and complexity, keeping a manual approach introduces inconsistencies across records and slows clinical decisions.
Augment clinical teams using AI: The value of AI in this context is not autonomous decision-making, but quick, consistent extraction of structured medical entities from unstructured data. By processing clinical data from multiple documents and creating insightful reports, this AI-Driven solution helps clinical teams manage documentation challenges without altering the decision making workflow.
Customize medical entity extraction with templates: Different medical specialties have different data requirements. Oncology, cardiology, and traumatology teams do not look for the same signals in patient documents. This template-driven entity extraction helps AI systems adjust to different clinical situations, ensuring outputs are relevant, usable, and aligned with each speciality’s needs.
Preserve provenance and auditability: MongoDB helps retain the full extraction context, including template versions and provenance, enabling clinicians to trace each extracted fact back to its source.
Authors
Patricia Renart Carnicero, MongoDB
Francesc Mateu Amengual, MongoDB
Sakshi Gark, MongoDB