Real-Time Card Fraud Solution Accelerator with MongoDB and Databricks
Rate this article
Card fraud is a significant problem and fear for both consumers and businesses. However, despite the seriousness of it, there are solutions that can be implemented for card fraud prevention. Financial institutions have various processes and technical solutions in place to detect and prevent card fraud, such as monitoring transactions for suspicious activity, implementing know-your-customer (KYC) procedures, and a combination of controls based on static rules or machine learning models. These can all help, but they are not without their own challenges.
Financial institutions with legacy fraud prevention systems can find themselves fighting against their own data infrastructure. These challenges can include:
- Incomplete data: Legacy systems may not have access to all relevant data sources, leading to a lack of visibility into fraud patterns and behaviors.
- Latency: Fraud prevention systems need to execute fast enough to be able to be deployed as part of a real-time payment approval process. Legacy systems often lack this capability.
- Difficulty to change: Legacy systems have been designed to work within specific parameters, and changing them to meet new requirements is often difficult and time-consuming.
- Weak security: Legacy systems may have outdated security protocols that leave organizations vulnerable to cyber attacks.
- Operational overheads due to technical sprawl: Existing architectures often pose operational challenges due to diverse technologies that have been deployed to support the different access patterns required by fraud models and ML training. This technical sprawl in the environment requires significant resources to maintain and update.
- High operation costs: Legacy systems can be costly to operate, requiring significant resources to maintain and update.
- No collaboration between application and data science teams: Technical boundaries between the operational platform and the data science platform are stopping application developers and data science teams from working collaboratively, leading to longer time to market and higher overheads.
These data issues can be detrimental to a financial institution trying desperately to keep up with the demands of customer expectations, user experience, and fraud. As technology is advancing rapidly, surely so is card fraud, becoming increasingly sophisticated. This has naturally led to an absolute need for real-time solutions to detect and prevent card fraud effectively. Anything less than that is unacceptable. So, how can financial institutions today meet these demands? The answer is simple. Fraud detection big data analytics should .
What does this look like in practice? Application-driven analytics for fraud detection is the solution for the very real challenges financial institutions face today, as mentioned above.
To break down what this looks like, we will demonstrate how easy it is to build an ML-based fraud solution using MongoDB and Databricks. The functional and nonfunctional features of this proposed solution include:
- Data completeness: To address the challenge of incomplete data, the system will be integrated with external data sources to ensure complete and accurate data is available for analysis.
- Real-time processing: The system will be designed to process data in real time, enabling the timely detection of fraudulent activities.
- AI/ML modeling and model use: Organizations can leverage AI/ML to enhance their fraud prevention capabilities. AI/ML algorithms can quickly identify and flag potential fraud patterns and behaviors.
- Real-time monitoring: Organizations should aim to enable real-time monitoring of the application, allowing for real-time processing and analysis of data.
- Model observability: Organizations should aim to improve observability in their systems to ensure that they have full visibility into fraud patterns and behaviors.
- Ease of operation: The system will be designed with ease of operation in mind, reducing operational headaches and enabling the fraud prevention team to focus on their core responsibilities..
- Application development and data science team collaboration: Organizations should aim to enable collaboration between application development and data science teams to ensure that the goals and objectives are aligned, and cooperation is optimized.
The functional features listed above can be implemented by a few architectural components. These include:
- Data sourcing
- Producer apps: The producer mobile app simulates the generation of live transactions.
- Legacy data source: The SQL external data source is used for customer demographics.
- Training data: Historical transaction data needed for model training data is sourced from cloud object storage - Amazon S3 or Microsoft Azure Blob Storage.
- MongoDB Atlas: Serves as the Operational Data Store (ODS) for card transactions and processes transactions in real time. The solution leverages aggregation framework to perform in-app analytics to process transactions based on pre-configured rules and communicates with Databricks for advanced AI/ML-based fraud detection via a native Spark connector.
Now, let’s break down these architectural components in greater detail below, one by one.
Figure 1: MongoDB for event-driven and shift-left analytics architecture
The first step in implementing a comprehensive fraud detection solution is aggregating data from all relevant data sources. As shown in Figure 1 above, an event-driven federated architecture is used to collect and process data from real-time sources such as producer apps, batch legacy systems data sources such as SQL databases, and historical training data sets from offline storage. This approach enables data sourcing from various facets such as transaction summary, customer demography, merchant information, and other relevant sources, ensuring data completeness.
Additionally, the proposed event-driven architecture provides the following benefits:
- Real-time transaction data unification, which allows for the collection of card transaction event data such as transaction amount, location, time of the transaction, payment gateway information, payment device information, etc., in real-time.
- Helps re-train monitoring models based on live event activity to combat fraud as it happens.
The producer application for the demonstration purpose is a Python script that generates live transaction information at a predefined rate (transactions/sec, which is configurable).
Figure 2: Transaction collection sample document
MongoDB Atlas is a managed data platform that offers several features that make it the perfect choice as the datastore for card fraud transaction classification. It supports and can handle various types of data, high scalability to meet demand, advanced security features to ensure compliance with regulatory requirements, real-time data processing for fast and accurate fraud detection, and to store data closer to customers and comply with local data privacy regulations.
Figure 3: MongoDB for event-driven and shift-left analytics architecture
Figure 4: The processed and “features of transaction” MongoDB sample document
Figure 5: Processed transaction sample document
The example solution manages rules-based fraud prevention by storing user-defined payment limits and information in a user settings collection, as shown below. This includes maximum dollar limits per transaction, the number of transactions allowed per day, and other user-related details. By filtering transactions based on these rules before invoking expensive AI/ML models, the overall cost of fraud prevention is reduced.
Databricks is a powerful AI/ML platform to develop models for identifying fraudulent transactions. One of the key features of Databricks is the support of real-time analytics. As discussed above, real-time analytics is a key feature of modern fraud detection systems.
Databricks includes MLFlow, a powerful tool for managing the end-to-end machine learning lifecycle. MLFlow allows users to track experiments, reproduce results, and deploy models at scale, making it easier to manage complex machine learning workflows. MLFlow offers model observability, which allows for easy tracking of model performance and debugging. This includes access to model metrics, logs, and other relevant data, which can be used to identify issues and improve the accuracy of the model over time. Additionally, these features can help in the design of modern fraud detection systems using AI/ML.
Workflows needed for processing and building models for validating the authenticity of transactions are done through the Databricks AI/ML platform. There are mainly two workflow sets to achieve this:
1: The Streaming workflow, which runs in the background continuously to consume incoming transactions in real-time using the MongoDB Spark streaming connector. Every transaction first undergoes data preparation and a feature extraction process; the transformed features are then streamed back to the MongoDB collection with the help of a Spark streaming connector.
Figure 7: Streaming workflow
2: The Training workflow is a scheduled process that performs three main tasks/notebooks, as mentioned below. This workflow can be either manually triggered or through the Git CI/CD (webhooks).
Figure 8: Training workflow stages
Modernizing legacy fraud prevention systems using MongoDB and Databricks can provide many benefits, such as improved detection accuracy, increased flexibility and scalability, enhanced security, reduced operational headaches, reduced cost of operation, early pilots and quick iteration, and enhanced customer experience.
Modernizing legacy fraud prevention systems is essential to handling the challenges posed by modern fraud schemes. By incorporating advanced technologies such as MongoDB and Databricks, organizations can improve their fraud prevention capabilities, protect sensitive data, and reduce operational headaches. With the solution proposed, organizations can take a step forward in their fraud prevention journey to achieve their goals.