EventJoin us at AWS re:Invent 2024! Learn how to use MongoDB for AI use cases. Learn more >>Join us at AWS re:Invent 2024! Learn how to use MongoDB for AI use cases. >>

Solutions

Fraud prevention and AML using Vector Search and OpenAI

Combine real-time analytics with semantic search by integrating MongoDB Atlas Vector Search with OpenAI-generated embeddings to detect fraud that traditional methods miss.
Start FreeView the demo
Illustration of the vector search function
Solution Overview

Fraud and anti-money laundering (AML) are major concerns for both businesses and consumers, impacting financial services institutions across commercial banking and capital markets. The volume and complexity of transactions across all areas of banking make it easier for cyber criminals to hide fraud and commit money laundering. Traditional methods of tackling these issues, including rule-based systems and machine learning methods, are limited by feature engineering overhead to keep the models relevant. This results in obsolete updates and delayed enhancements.

Vector search can potentially improve fraud detection and AML efforts by addressing these limitations, representing the next step in the evolution of machine learning for combating fraud. Any organization that is already benefiting from real-time analytics will find that this breakthrough in anomaly detection takes fraud detection and AML accuracy to the next level.

This solution will look at how the convergence of real-time analytics using MongoDB Atlas Vector Search presents a compelling synergy, enabling organizations to uncover deeply hidden insights before fraud occurs. This solution leverages real-time data feeds and continuous monitoring to detect emerging threats and adapt to evolving risk landscapes.

Revolutionizing Fraud Detection with Atlas Vector Search

Building the solution and reference architectures
How Atlas Vector Search can help

By applying gen AI to risk assessments, lenders can explore additional risk factors that gen AI can evaluate. One factor could be the risk of natural disasters or broader climate risks. In Figure 2, we added flood risk specifically as a factor to the previous question to see what the ChatGPT4-o comes back with. How Atlas Vector Search can help A vector database like MongoDB Atlas makes it easier to find similarities and relationships between different types of data. Rather than using a standalone or bolt-on vector database, MongoDB’s developer data platform empowers users to store their operational data, metadata, and vector embeddings on MongoDB Atlas and seamlessly use Atlas Vector Search to index and retrieve data for gen AI applications.

The combination of real-time analytics and vector search offers a powerful synergy that enables organizations to discover insights that are otherwise elusive with traditional methods. MongoDB facilitates this through Atlas Vector Search integrated with OpenAI-generated embeddings as illustrated in Figure 1 below.

Figure 1: Atlas Vector Search in action for Fraud and AML detection
How are vector embeddings for fraud detection and AML created?

In this solution, the fraud embedding is based on a combination of text, transactions, and counterparty data. The AML embedding is created based on transactions, relationships between counterparties, and their risk profiles.

The choice of data sources, including the use of unstructured data, and the creation of one or more vector embeddings can be configured to meet your specific needs. This solution uses OpenAI to generate vector embeddings, but you can use your preferred embedding model as well.

Historical vector embeddings

The demo database is pre-populated with synthetically generated test data for both fraud and AML embeddings. In real-world scenarios, you can generate embeddings by encoding historical transaction data and customer profiles as vectors.

One intuitive approach is to unite the data in a data store such as MongoDB. This provides the flexibility necessary to capture a wide variety of data types and vectorize the relevant data fields.

In this demo, we enable meaningful semantic search using vector embeddings to simulate how human analysts or investigators would evaluate the transaction(s) or suspicious cases in question.

Typically, a human analyst or investigator would first collect the relevant data (sourced from internet or internally), synthesize the relevant structured data (date, transactions type, amount) and unstructured or semi-structured data (transaction description, job or business description of the person or company, relationship with the recipient, etc.) and create a written report to provide contextual for the transactions being examined.

An anti-fraud/AML application that leverages vector search can perform a similar action easily by consolidating the data (quantitative and qualitative) to construct a textual narrative (that can also be supplemented with LLMs). The constructed narrative can then be used for the vector embedding and subsequently perform the semantic search (via vector search) for similar transactions or cases.

Fraud detection and AML workflow

As shown in Figure 1, incoming transaction fraud and AML aggregated text are used to generate embeddings using an embedding model (OpenAI) first and then analyzed using Atlas Vector Search based on the percentage of previous transactions with similar characteristics that were flagged for suspicious activity.

Outcome of transaction processing

If flagged as fraudulent/suspicious: The transaction request is declined.

If not flagged: The transaction is completed successfully, and a confirmation message is shown.

For rejected transactions, users can contact case management services with the transaction reference number for details. No action is needed for successful transactions.

Combining Atlas Vector Search for fraud detection and AML

With the use of Atlas Vector Search and the relevant embeddings, organizations can:

  1. Improve fraud detection accuracy: Atlas Vector Search captures complex, high-dimensional patterns that rule-based and ML models often overlook, leading to more precise fraud detection. By analyzing the full context of transactions, vector search can also better uncover subtle fraud signals, improving the detection of sophisticated schemes that simpler models might miss.

  2. Detect new fraud schemes faster: With real-time anomaly detection, Atlas Vector Search can help identify novel fraud or money laundering tactics more quickly, reducing the risk of emerging threats without the need for constant model retraining.

  3. Scale and adapt effortlessly: MongoDB’s multi-model operational data store allows organizations to leverage structured and especially unstructured data such as text and images in a single operational and AI data store for fraud detection and AML, revealing hidden patterns that traditional systems can't process, all without adding multiple niche data stores and vector stores. With a highly scalable architecture combined with the option to deploy dedicated search node(s) for workload isolation, MongoDB helps organizations scale effortlessly with growing datasets and adapt dynamically to new fraud or money laundering patterns, providing a more flexible and future-proof anti-financial crime framework.

Why MongoDB for AML and fraud prevention

Fraud detection and AML require a holistic platform approach as they involve diverse data sets that are constantly evolving. Customers choose MongoDB because it is a unified data platform that eliminates the need for niche technologies, such as a dedicated vector database. MongoDB’s document data model incorporates any kind of data–any structure (structured, semi-structured, and unstructured), any format, any source–no matter how often it changes, allowing you to create a holistic picture of customers to better predict transaction anomalies in real-time.

By incorporating Atlas Vector Search, institutions can:

  • build intelligent applications powered by semantic search and generative AI over any type of data.
  • store vector embeddings right next to source data and metadata–vectors inserted or updated in the database are automatically synchronized to the vector index.
  • optimize resource consumption, improve performance, and enhance availability with Search Nodes.
  • remove operational heavy lifting with the battle-tested, fully managed MongoDB Atlas developer data platform.

Given the broad and evolving nature of fraud and AML, these areas typically require multiple methods and a multi-modal approach. As such, a unified risk data platform offers several advantages for organizations aiming to build effective solutions.

Figure 2: High level architecture of a fraud detection/AML system

This GitHub repository presents a demo where a customer accesses a bank's website to perform transactions. It focuses on the clearing stage of the transaction, where the bank goes through a series of verifications to combat fraud and uphold sanctions and AML laws. The demo includes an API that can flag sanctioned customers and also apply an innovative process that constructs a textual narrative from quantitative and qualitative data to flag both AML and fraudulent transactions. These processes include AI embeddings as well as MongoDB functionalities such as full-text and vector search, Atlas App Services, and others.

Key Learnings
  • This solution uses OpenAI for vector embedding. You will need an OpenAI API key, which is not included in the GitHub for running this demo application. You can also decide to change to another embedding model model, but this may require some code changes.
Technologies and products used

MongoDB developer data platform:

Other key technologies:

  • OpenAI
Related Resources
general_content_developer

GitHub Repository: Fraud Prevention and Anti Money Laundering with MongoDB

Create this solution by following the instructions and associated models in the repository.

industry_credit_card

Fraud Prevention and AML with MongoDB

Analyze and detect fraud in real time and satisfy AML requirements.

industry_ai

Innovate With AI: The Future Enterprise

Explore the top use cases across the six core industries that are infused with MongoDB Atlas AI capabilities.

Get started with Atlas

Get started in seconds. Our free clusters come with 512 MB of storage so you can experiment with sample data and get familiar with our platform.
Try FreeContact sales
Illustration of hands typing on a laptop in the foreground and a superimposed desktop window and coffee cup in the background.