Docs Menu
Docs Home
/

Xlrt: Optimizing Financial Document Processing through Agentic AI and Knowledge Graphs

This document outlines the architecture of Xlrt, a solution designed to optimize financial document analysis and complex workflows by using agentic AI and a knowledge graph powered by MongoDB.

Xlrt transforms financial decision-making by eliminating manual bottlenecks and AI hallucinations to generate an accurate, complete view of customer financial health. The system uses intelligent agents that process data, execute actions, and adapt within financial ecosystems. The resulting decisions typically involve optimizing loan approval processes, tailoring product recommendations, and managing risks such as default and fraud.

Traditional financial AI approaches often encounter bottlenecks that limit business impact and trust, especially when processing complex, quantitative data. Common challenges include:

  • Manual Workflow Bottlenecks: Processes such as creating credit memos require intensive manual compilation, analysis, and iterative reviews, leading to delays and errors.

  • Lack of Contextual Grounding: AI models often lack the enterprise-specific financial context required for precise, numerical accuracy. This leads to outputs that may be factually correct but are not actionable.

  • Accuracy and Reliability Risks (Hallucinations): Even with structured reasoning, Large Language Models (LLMs) can process context incorrectly or make logical errors, producing results that are not logically sound or accurate.

Xlrt overcomes these challenges by using the following methods:

  • Graph Retrieval-Augmented Generation (Graph RAG): Which uses a financial ontology - a graph structure representing key financial line items and their relationships - to selectively retrieve relevant financial knowledge and numerical data.

  • Role-Specific Agents and Chain-of-Thought (CoT) Reasoning: To automate end-to-end workflows.

  • Scoring-Based Feedback: To augment reasoning, assess accuracy, and iteratively refine CoT prompts until the results meet accuracy standards.

Knowledge agents are intelligent systems designed to navigate, analyze, and infer insights from complex datasets.

In financial contexts, understanding complex relationships in data is essential. Xlrt integrates Knowledge Agents with Graph RAG to achieve this goal.

The financial ontology is a knowledge graph that is the core of the Xlrt approach. This ontology acts as the blueprint that provides the rules and constraints for how financial entities relate.

The graph store, powered by MongoDB Atlas or MongoDB Enterprise Advanced, is the persistent database layer that stores the graph structure and its associated financial data.

Xlrt’s Knowledge Agent + Graph RAG System Powered by MongoDB.

Figure 1. Xlrt’s Knowledge Agent + Graph RAG System Powered by MongoDB.

The underlying graph store contains two consolidated data components:

  • Domain-specific ontology structure: The conceptual blueprint of allowed nodes and edge types.

  • Yearly financial data: The specific instances of the graph (nodes and edges) extracted from client documents. This data, sourced from documents such as annual reports and bank statements, populates the nodes and edges with numerical values for each reporting period. This continuous population over time enables the system to analyze historical trends and financial evolution.

  • Nodes: Financial line items, such as Revenue, Expenses, and Net Profit.

  • Edges: Causal or structural relationships, such as Revenue affecting Net Profit. These edges define the semantic relationship between two nodes.

Graph RAG combines knowledge graphs (graph theory) with AI retrieval and generation techniques. Xlrt uses Graph RAG to ground LLMs by retrieving relevant knowledge and numerical information from the graph store. This grounding ensures outputs are contextualized, factual, and actionable.

Graph RAG enables the system to:

  • Analyze Causal Dependencies: The system traces cause-and-effect relationships (edges) and identifies how a change in one financial line item might influence others.

  • Identify Illogical Correlations: The system examines relationships between nodes to detect inconsistencies or correlations that defy financial logic. This verification ensures data integrity.

  • Retrieve Context for Any Line Item: The system queries the graph, extracts relevant nodes and edges, and provides a contextual snapshot of the data surrounding a line item. This snapshot clarifies how individual components interact within the financial structure.

This approach provides contextual awareness and better decision-making by revealing the structure and interconnectivity of financial datasets.

LangChain connects the LLMs directly to MongoDB. The MongoDBGraphStore component facilitates this connection and manages the data flow between the language model and the database. This integration transforms unstructured financial data into actionable, interconnected insights without the need for a dedicated graph database engine.

The system relies on the flexible architecture of MongoDB to serve as the foundation for the knowledge graph:

  • Unified Operational and Graph Data: Unlike traditional approaches that separate graph data from operational data, MongoDB stores both the domain-specific ontology and the specific instances of financial data (nodes and edges) in the same flexible document format. This enables the system to continuously populate the graph with new data from annual reports or bank statements without rigid schema migrations.

  • Efficient Graph Traversal: MongoDB executes graph traversal and querying by using the $graphLookup aggregation stage. This process enables the swift retrieval of relevant interconnected financial knowledge directly alongside operational data.

Although MongoDB provides the engine, the MongoDBGraphStore component in LangChain acts as the orchestrator. This component streamlines the implementation of Graph RAG through two key functions:

  • Abstraction & Retrieval: MongoDBGraphStore abstracts raw database aggregations, simplifying the retrieval of graph data. The component automatically formats the retrieved knowledge graph into context-rich prompts, optimizing the data for agentic reasoning without requiring manual query construction.

  • Dynamic Graph Creation: To populate the graph, the component uses a dynamic "Extract-and-Load" workflow:

    • Entity Extraction: An LLM-based entity extraction model (initialized within the component) parses client-uploaded financial statements. It converts unstructured data into structured graph entities and relationships by extracting named entities and their connections.

    • Configuration: Custom prompts and instructions guide the extraction process. These prompts, which are configurable through the entity_prompt parameter, ensure the model maps data to the correct financial context.

    • Graph population: Using the add_documents() method, the model automatically extracts and upserts these entities and relationships into the MongoDB collection. This creates a dynamic knowledge graph that evolves instantly as new documents are processed.

Although retrieving the correct context through Graph RAG is critical, ensuring the reasoning applied to that data is accurate is equally important. Xlrt enhances standard Chain of Thought (CoT) prompting by introducing a scoring loop that iteratively refines the model's output.

Even with structured reasoning, large language models (LLMs) can sometimes misunderstand context or make erroneous logical jumps, leading to hallucinations. To mitigate this risk, Xlrt uses a dual-model architecture:

  • Performer LLM: Generates the initial response based on the financial data.

  • Prompt Augmentation LLM: Evaluates the Performer's output and adjusts the prompt if the quality is insufficient.

The system evaluates every response against three key metrics:

  • Contextual Consistency: Does the response align with the specific financial context provided?

  • Factual Accuracy: Does the output adhere to known facts and data rules?

  • Logical Soundness: Are the intermediate reasoning steps connected and valid?

If a response scores below a certain threshold, such as 60 percent, the Prompt Augmentation LLM analyzes the error and generates a refined prompt—for example, explicitly instructing the Performer LLM to "verify the percentage change compared to the previous quarter." This cycle repeats until the response meets accuracy standards, ensuring high-reliability outputs for critical tasks such as creating credit memos.

To ensure the output is relevant, the system uses user feedback in two ways to refine Chain-of-Thought (CoT) generation:

  • Role-Based Adaptation: Instead of correcting only errors, the system uses user feedback to tailor CoT prompts to the user's specific context.

  • Dynamic Augmentation: A dedicated LLM analyzes feedback to adjust the prompt—for example, focusing on compliance for an auditor or business impact for an executive.

Xlrt uses its Graph RAG architecture to transform time-consuming financial document workflows. It uses agentic AI grounded in the financial knowledge graph to power three product offerings:

  • Justifi™: Provides instant analysis of financial statements, such as 10-Ks, annual reports, and management-prepared financials. It also provides normalized analysis for data providers and smart summaries.

  • Contractus™: Provides automatic analysis of commercial contracts, infers commercial terms to project cash flows, and facilitates organizational contract management.

  • Facturas™: Enables template-free parsing of invoices and automated data evaluation, ensuring a superior validation flow for straight-through processing.

The automation of these complex workflows offers the following benefits:

  • Accuracy: Domain-tuned agents, grounded in the factual knowledge graph, ensure precise and consistent financial insights.

  • Cost Reduction: The system reduces dependency on manual labor while maintaining high-quality, auditable results.

  • Efficiency: Automated end-to-end workflows drastically reduce manual effort and enable accelerated data processing and faster decision-making.

The following steps illustrate the key implementation logic that Xlrt uses to integrate LangChain and its MongoDBGraphStore component to build a Graph RAG for intelligent document processing. This illustration uses MongoDB Atlas, although MongoDB Enterprise Advanced is also an option. Xlrt has chosen to use Ollama to run the LLM:

1

Install the necessary libraries for MongoDB interaction, LangChain integration, and the Ollama to send requests to LLMs are installed. The architecture is LLM-agnostic, allowing one to plug in any LLM model. To complete this tutorial, you need an Atlas cluster running MongoDB version 7.0.2 or later.

pip install --quiet --upgrade pymongo langchain_mongodb langchain_ollama
2
MONGODB_URI = "<connection-string>"
DB_NAME = "financial_kg_db" # MongoDB database to store the knowledge graph
COLLECTION_NAME = "FINANCIALS" # MongoDB collection to store the knowledge graph

<connection-string> would be the connection string from the Atlas cluster.

The <connection-string> is defined using the following format:

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
3
from langchain_mongodb.graphrag.graph import MongoDBGraphStore
graph_store = MongoDBGraphStore.from_connection_string(
connection_string=MONGODB_URI,
database_name=DB_NAME,
collection_name=COLLECTION_NAME,
entity_extraction_model=chat_model # LLM – model of your choice
)
4
# Add documents to extract entities and relationships from and to the graph store
graph_store.add_documents(documents)
5
# Query the knowledge graph to get related entities and context for prompts Query = "What are the key financial metrics for Q4? "
Context = graph_store.query(
query,
max_hops=3
)
6

After the relevant data is retrieved, the retrieved context from Graph RAG is combined to augment the prompt for the LLM by using LangChain and the selected model, which Ollama serves.

Python Example for Context Construction:

from langchain_ollama import ChatOllama
from langchain.prompts import PromptTemplate
# Set up Ollama as the LLM with your model of choice
llm = ChatOllama(model="<model of your choice>")
# Define a prompt template
template = """
You are an AI financial analyst. Analyze the following data and provide insights: {context}
User Query: {query}
"""
prompt = PromptTemplate(
input_variables= ["context", "query"],
template=template
)
chain = prompt | llm
response = chain.invoke({"context": context, "query": Query})
print("Generated Insights:\n", response)

Financial organizations can integrate Xlrt with MongoDB Atlas or MongoDB Enterprise Advanced to use Graph RAG systems for advanced intelligent document processing and automated workflows.

This combination transforms unstructured financial data into actionable insights. The MongoDB-backed financial ontology improves efficiency, accuracy, and strategic decision-making.

  • Grounding LLMs with the MongoDB Graph RAG Architecture: By using a MongoDB Graph RAG architecture, LLMs are grounded in a dynamic financial ontology. This approach uses the $graphLookup aggregation stage to traverse interconnected relationships within a unified knowledge base, ensuring precise and context-aware retrieval.

  • Powering Complex Agents with Scoring-Based Reasoning: Beyond simple retrieval, the architecture supports advanced feedback loops. By validating Chain-of-Thought reasoning against verified financial facts retrieved from MongoDB, the system ensures accuracy before finalizing a response. This iterative scoring prevents hallucinations, ensuring that every insight is logically sound and consistent with your financial ontology.

  • Turn Technical Capabilities into Business Value: By using MongoDB to unify unstructured documents and structured knowledge graphs, organizations can transform manual bottlenecks, such as credit analysis, into automated, intelligent workflows. This architectural shift reduces operational overhead and minimizes dependency on manual processes.

Back

IntellectAI Purple Fabric

On this page