In this solution, you'll learn how easy it is to build an ML-based fraud solution using MongoDB and Databricks. The solution's key features include data completeness through integration with external sources, real-time processing for timely fraud detection, AI/ML modeling to identify potential fraud patterns, real-time monitoring for instant analysis, model observability for full visibility into fraud behaviors, flexibility, scalability, and robust security measures. The system aims to facilitate ease of operation and foster collaboration between application development and data science teams. Furthermore, it supports end-to-end CI/CD pipelines to ensure up-to-date and secure systems.
The ML-based fraud solution is suitable for industries where real-time processing, AI/ML modeling, model observability, flexibility, and collaboration between teams are absolutely essential. The system ensures up-to-date and secure operations through end-to-end CI/CD pipelines. Relevant Industries include:
As you can see from the domain diagram, there are three entities when working with credit card transactions: the transaction itself, the merchant, and payer involved in the transaction. Since all three are important and accessed together in our fraud detection application, we use the extended reference pattern and include fields about the transaction, merchant, and payer in a single document.
The functional features listed above can be implemented by a few architectural components. These include:
Now, let’s break down these architectural components in greater detail below, one by one.
The first step in implementing a comprehensive fraud detection solution is aggregating data from all relevant data sources. As shown in Figure 1 above, an event-driven federated architecture is used to collect and process data from real-time sources such as producer apps, batch legacy systems data sources such as SQL databases, and historical training data sets from offline storage. This approach enables data sourcing from various facets such as transaction summary, customer demography, merchant information, and other relevant sources, ensuring data completeness.
Additionally, the proposed event-driven architecture provides the following benefits:
The producer application for the demonstration is a Python script that generates live transaction information at a predefined rate (transactions/sec, which is configurable).
MongoDB Atlas is a managed developer data platform that offers several features that make it the perfect choice as the datastore for card fraud transaction classification. It supports and can handle various types of data, high scalability to meet demand, advanced security features to support compliance with regulatory requirements, real-time data processing for fast and accurate fraud detection, and to store data closer to customers and comply with local data privacy regulations.
The integrates Apache Spark and MongoDB. Apache Spark, hosted by Databricks, allows the processing and analysis of large amounts of data in real-time. The Spark Connector translates MongoDB data into Spark data frames and supports real-time Spark streaming.
The offered by MongoDB allow for real-time processing of data through change streams and triggers. Because MongoDB Atlas is capable of storing and processing various types of data as well as streaming capabilities and trigger functionality, it is well suited for use in an event-driven architecture.
This solution uses the rich connector ecosystem of to process transactions in real-time. The App Service Trigger function is used by invoking a REST service call to an AI/ML model hosted through the Databricks MLflow framework.
The example solution manages rules-based fraud prevention by storing user-defined payment limits and information in a user settings collection, as shown. This includes maximum dollar limits per transaction, the number of transactions allowed per day, and other user-related details. By filtering transactions based on these rules before invoking expensive AI/ML models, the overall cost of fraud prevention is reduced.
Databricks is a powerful AI/ML platform to develop models for identifying fraudulent transactions. One of the key features of Databricks is the support of real-time analytics. As discussed above, real-time analytics is a key feature of modern fraud detection systems.
Databricks includes MLFlow, a powerful tool for managing the end-to-end machine learning lifecycle. MLFlow allows users to track experiments, reproduce results, and deploy models at scale, making it easier to manage complex machine learning workflows. MLFlow offers model observability, which allows for easy tracking of model performance and debugging. This includes access to model metrics, logs, and other relevant data, which can be used to identify issues and improve the accuracy of the model over time. Additionally, these features can help in the design of modern fraud detection systems using AI/ML.
The proposed solution's functional and nonfunctional features include:
Create this solution yourself with the associated sample data, functions, and code.
Watch the creation of a credit fraud detection solution and view it in a real-time demo.
Learn how MongoDB’s developer data platform supports a wide range of use cases in the financial services industry.
Analyze and detect fraud in real time and satisfy Know Your Customer (KYC) requirements.