MongoDB Developer
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right

Real-Time Card Fraud Solution Accelerator with MongoDB and Databricks

Shiv Pullepu, Ashwin Gangadhar, Rajesh VinayagamPublished Mar 08, 2023 • Updated Jul 11, 2023
Machine LearningMongoDB
Facebook Icontwitter iconlinkedin icon
Rate this article
Card fraud is a significant problem and fear for both consumers and businesses. However, despite the seriousness of it, there are solutions that can be implemented for card fraud prevention. Financial institutions have various processes and technical solutions in place to detect and prevent card fraud, such as monitoring transactions for suspicious activity, implementing know-your-customer (KYC) procedures, and a combination of controls based on static rules or machine learning models. These can all help, but they are not without their own challenges.
Financial institutions with legacy fraud prevention systems can find themselves fighting against their own data infrastructure. These challenges can include:
  • Incomplete data: Legacy systems may not have access to all relevant data sources, leading to a lack of visibility into fraud patterns and behaviors.
  • Latency: Fraud prevention systems need to execute fast enough to be able to be deployed as part of a real-time payment approval process. Legacy systems often lack this capability.
  • Difficulty to change: Legacy systems have been designed to work within specific parameters, and changing them to meet new requirements is often difficult and time-consuming.
  • Weak security: Legacy systems may have outdated security protocols that leave organizations vulnerable to cyber attacks.
  • Operational overheads due to technical sprawl: Existing architectures often pose operational challenges due to diverse technologies that have been deployed to support the different access patterns required by fraud models and ML training. This technical sprawl in the environment requires significant resources to maintain and update.
  • High operation costs: Legacy systems can be costly to operate, requiring significant resources to maintain and update.
  • No collaboration between application and data science teams: Technical boundaries between the operational platform and the data science platform are stopping application developers and data science teams from working collaboratively, leading to longer time to market and higher overheads.
These data issues can be detrimental to a financial institution trying desperately to keep up with the demands of customer expectations, user experience, and fraud. As technology is advancing rapidly, surely so is card fraud, becoming increasingly sophisticated. This has naturally led to an absolute need for real-time solutions to detect and prevent card fraud effectively. Anything less than that is unacceptable. So, how can financial institutions today meet these demands? The answer is simple. Fraud detection big data analytics should shift-left to the application itself.
What does this look like in practice? Application-driven analytics for fraud detection is the solution for the very real challenges financial institutions face today, as mentioned above.

Solution overview

To break down what this looks like, we will demonstrate how easy it is to build an ML-based fraud solution using MongoDB and Databricks. The functional and nonfunctional features of this proposed solution include:
  • Data completeness: To address the challenge of incomplete data, the system will be integrated with external data sources to ensure complete and accurate data is available for analysis.
  • Real-time processing: The system will be designed to process data in real time, enabling the timely detection of fraudulent activities.
  • AI/ML modeling and model use: Organizations can leverage AI/ML to enhance their fraud prevention capabilities. AI/ML algorithms can quickly identify and flag potential fraud patterns and behaviors.
  • Real-time monitoring: Organizations should aim to enable real-time monitoring of the application, allowing for real-time processing and analysis of data.
  • Model observability: Organizations should aim to improve observability in their systems to ensure that they have full visibility into fraud patterns and behaviors.
  • Flexibility and scalability: The system will be designed with flexibility and scalability in mind, allowing for easy changes to be made to accommodate changing business needs and regulatory requirements.
  • Security: The system will be designed with robust security measures to protect against potential security breaches, including encryption, access control, and audit trails.
  • Ease of operation: The system will be designed with ease of operation in mind, reducing operational headaches and enabling the fraud prevention team to focus on their core responsibilities..
  • Application development and data science team collaboration: Organizations should aim to enable collaboration between application development and data science teams to ensure that the goals and objectives are aligned, and cooperation is optimized.
  • End-to-end CI/CD pipeline support: Organizations should aim to have end-to-end CI/CD pipeline support to ensure that their systems are up-to-date and secure.

Solution components

The functional features listed above can be implemented by a few architectural components. These include:
  1. Data sourcing
    1. Producer apps: The producer mobile app simulates the generation of live transactions.
    2. Legacy data source: The SQL external data source is used for customer demographics.
    3. Training data: Historical transaction data needed for model training data is sourced from cloud object storage - Amazon S3 or Microsoft Azure Blob Storage.
  2. MongoDB Atlas: Serves as the Operational Data Store (ODS) for card transactions and processes transactions in real time. The solution leverages MongoDB Atlas aggregation framework to perform in-app analytics to process transactions based on pre-configured rules and communicates with Databricks for advanced AI/ML-based fraud detection via a native Spark connector.
  3. Databricks: Hosts the AI/ML platform to complement MongoDB Atlas in-app analytics. A fraud detection algorithm used in this example is a notebook inspired by Databrick's fraud framework. MLFlow has been used to manage the MLOps for managing this model. The trained model is exposed as a REST endpoint.
Now, let’s break down these architectural components in greater detail below, one by one.
MongoDB for event-driven and shift-left analytics architecture Figure 1: MongoDB for event-driven and shift-left analytics architecture
1. Data sourcing
The first step in implementing a comprehensive fraud detection solution is aggregating data from all relevant data sources. As shown in Figure 1 above, an event-driven federated architecture is used to collect and process data from real-time sources such as producer apps, batch legacy systems data sources such as SQL databases, and historical training data sets from offline storage. This approach enables data sourcing from various facets such as transaction summary, customer demography, merchant information, and other relevant sources, ensuring data completeness.
Additionally, the proposed event-driven architecture provides the following benefits:
  • Real-time transaction data unification, which allows for the collection of card transaction event data such as transaction amount, location, time of the transaction, payment gateway information, payment device information, etc., in real-time.
  • Helps re-train monitoring models based on live event activity to combat fraud as it happens.
The producer application for the demonstration purpose is a Python script that generates live transaction information at a predefined rate (transactions/sec, which is configurable).
Figure 2: Transaction collection sample document Figure 2: Transaction collection sample document
2. MongoDB for event-driven, shift-left analytics architecture
MongoDB Atlas is a managed data platform that offers several features that make it the perfect choice as the datastore for card fraud transaction classification. It supports flexible data models and can handle various types of data, high scalability to meet demand, advanced security features to ensure compliance with regulatory requirements, real-time data processing for fast and accurate fraud detection, and cloud-based deployment to store data closer to customers and comply with local data privacy regulations.
The MongoDB Spark Streaming Connector integrates Apache Spark and MongoDB. Apache Spark, hosted by Databricks, allows the processing and analysis of large amounts of data in real-time. The Spark Connector translates MongoDB data into Spark data frames and supports real time Spark streaming.
Figure 3: MongoDB for event-driven and shift-left analytics architecture Figure 3: MongoDB for event-driven and shift-left analytics architecture
The App Services features offered by MongoDB allow for real-time processing of data through change streams and triggers. Because MongoDB Atlas is capable of storing and processing various types of data as well as streaming capabilities and trigger functionality, it is well suited for use in an event-driven architecture.
In the demo, we used both the rich connector ecosystem of MongoDB and App Services to process transactions in real time. The App Service Trigger function is used by invoking a REST service call to an AI/ML model hosted through the Databricks MLflow framework.
Figure 4: The processed and “features of transaction” MongoDB sample document Figure 4: The processed and “features of transaction” MongoDB sample document
Figure 5: Processed transaction sample document Figure 5: Processed transaction sample document
Note: A combined view of the collections, as mentioned earlier, can be visually represented using MongoDB Charts to help better understand and observe the changing trends of fraudulent transactions. For advanced reporting purposes, materialized views can help.
The example solution manages rules-based fraud prevention by storing user-defined payment limits and information in a user settings collection, as shown below. This includes maximum dollar limits per transaction, the number of transactions allowed per day, and other user-related details. By filtering transactions based on these rules before invoking expensive AI/ML models, the overall cost of fraud prevention is reduced.
fraud detection
3. Databricks as an AI/ML ops platform
Databricks is a powerful AI/ML platform to develop models for identifying fraudulent transactions. One of the key features of Databricks is the support of real-time analytics. As discussed above, real-time analytics is a key feature of modern fraud detection systems.
Databricks includes MLFlow, a powerful tool for managing the end-to-end machine learning lifecycle. MLFlow allows users to track experiments, reproduce results, and deploy models at scale, making it easier to manage complex machine learning workflows. MLFlow offers model observability, which allows for easy tracking of model performance and debugging. This includes access to model metrics, logs, and other relevant data, which can be used to identify issues and improve the accuracy of the model over time. Additionally, these features can help in the design of modern fraud detection systems using AI/ML.

Demo artifacts and high-level description

Workflows needed for processing and building models for validating the authenticity of transactions are done through the Databricks AI/ML platform. There are mainly two workflow sets to achieve this:
1: The Streaming workflow, which runs in the background continuously to consume incoming transactions in real-time using the MongoDB Spark streaming connector. Every transaction first undergoes data preparation and a feature extraction process; the transformed features are then streamed back to the MongoDB collection with the help of a Spark streaming connector.
Figure 7: Streaming workflow Figure 7: Streaming workflow
2: The Training workflow is a scheduled process that performs three main tasks/notebooks, as mentioned below. This workflow can be either manually triggered or through the Git CI/CD (webhooks).
Figure 8: Training workflow stages Figure 8: Training workflow stages
A step-by-step breakdown of how the example solution works can be accessed at this GitHub repository, and an end-to-end solution demo is available.


Modernizing legacy fraud prevention systems using MongoDB and Databricks can provide many benefits, such as improved detection accuracy, increased flexibility and scalability, enhanced security, reduced operational headaches, reduced cost of operation, early pilots and quick iteration, and enhanced customer experience.
Modernizing legacy fraud prevention systems is essential to handling the challenges posed by modern fraud schemes. By incorporating advanced technologies such as MongoDB and Databricks, organizations can improve their fraud prevention capabilities, protect sensitive data, and reduce operational headaches. With the solution proposed, organizations can take a step forward in their fraud prevention journey to achieve their goals.
Learn more about how MongoDB can modernize your fraud prevention system, and contact the MongoDB team.

Facebook Icontwitter iconlinkedin icon
Rate this article

Unnecessary Indexes

May 31, 2022

From Zero to Hero with MrQ

Jun 13, 2023

How to Leverage an Event-Driven Architecture with MongoDB and Databricks

Jul 13, 2023

Triggers Treats and Tricks: Cascade Document Delete Using Triggers Preimage

May 13, 2022
Table of Contents