BLOGAnnounced at MongoDB.local NYC 2024: A recap of all announcements and updates — Learn more >

Solutions

Credit card application with Generative AI

Learn how the convergence of alternative data, artificial intelligence, and generative AI is reshaping the foundations of credit scoring.
Start FreeView the demo
An illustration featuring a green credit card, a paper document, a fingerprint, currency, and secured elements
Solution Overview

In this solution, you’ll learn how the convergence of alternative data, artificial intelligence, and generative AI (gen AI) is reshaping the foundations of credit scoring. The challenges of traditional models are being overcome through the adoption of alternative credit scoring methods by offering a more inclusive and nuanced assessment of creditworthiness. This solution provides an example via an online credit card application process that illustrates an approach to realizing the transformative opportunities of gen AI and how MongoDB can be leveraged to supercharge credit scoring. This approach can be applied to other credit products — like personal loans, mortgages, corporate loans, and trade finance credit lines — and their applications without necessarily confining them to a credit card product only.

The code to demonstrate all the features of MongoDB for building this solution is available in the following GitHub repo.

Challenges with traditional credit scoring

The pursuit of credit can be a labyrinthine journey, particularly when credit assessment processes and, in particular, credit scoring mechanisms pose significant challenges. Here are some of the challenges or limitations of traditional credit scoring models:

  • Limited credit history: Many individuals encounter hurdles in the form of limited or nonexistent credit history, making it difficult to prove their creditworthiness due to the lack of historical data.
  • Inconsistent income: Irregular income, typical in part-time work or freelancing, poses a challenge for traditional credit scoring models, which label individuals as higher risk, leading to application denials or restrictive credit limits.
  • High utilization of existing credit: Heavy reliance on existing credit, leading to elevated credit utilization ratios, becomes a stumbling block in credit applications as applicants may face rejection or approval with less favorable terms.
  • Lack of clarity in rejection reasons: A lack of transparency in rejection reasons leaves applicants in the dark, making it difficult for them to address the root cause and enhance their creditworthiness for future applications.
Transforming credit scoring and applications with MongoDB

An online application process generally aims to provide convenience and efficiency, allowing individuals to access tailored financial products while ensuring transparency and accuracy in the application information. However, due to issues using traditional credit scoring methods, the limitations and challenges hinder the experience provided, especially for individuals with limited credit profiles based on the traditional credit scoring approach.

In the following solution accelerator, we will explore how MongoDB can help transform this credit application in the following key aspects of the process:

  1. Simplify data capture and processing
  2. Enhance credit scoring with AI
  3. Explain the credit application declination
  4. Recommend alternative credit products
Simplify data capture and processing

Applying for credit cards or other credit products can often be a lengthy and intricate process. Let’s delve into the details:

  1. Application process complexity: Obtaining a credit card involves several steps, which can be time-consuming. Here’s a brief overview of the process:

    • Choosing a card: First, you must select a credit card that suits your needs. This involves researching various cards, comparing features, and understanding their terms and conditions.
    • Eligibility check: Next, you must verify if you meet the eligibility criteria the bank sets. These criteria typically consider factors like your credit rating, age, income, and liabilities.
    • Document submission: You’ll need to provide documents such as identity proof (like Social Security ID, passport, and/or driver's license), address proof (rental agreement, utility bills), and income proof (bank statements, salary slips, Form 16).
    • Application form: Filling out the credit card application form can be cumbersome. You can do this online via the bank’s website, net banking, or by visiting a branch. Some banks even require physical documents, although digital processes are becoming more common.
    • Verification and references: Banks verify the authenticity of your documents and cross-check the information provided. This step also involves computing the probability of delinquency using AI/ML algorithms.
  2. Redundant information collection: Unfortunately, banks often collect “redundant” data that they should already have. For instance:

    • KYC details: Even though they have access to your KYC (Know Your Customer) details, they still ask you to submit them repeatedly.
    • Income verification: Despite having your salary details, banking history, utility bills, rental payments, mobile payments, shopping expenditures, etc., from your bank statements, they may request additional proof to verify the same.

In summary, streamlining this process by eliminating redundant requests and leveraging existing data could significantly enhance the user experience.

These application forms for a credit card may be relatively simple, but the complexity increases with other credit products (e.g., auto-loan, mortgage, trade finance, etc.). Within an application form, there could be tabular but also hierarchical information that needs to be filled in not to mention alternative data to be sourced from possibly authorized third-party data sources by the borrower. MongoDB’s flexible developer data platform natively supports JSON data and does not require documents to have the same schema, improving the ability to handle various types of data.

Leveraging JSON for online credit application forms simplifies the data capture process and also the performance in data processing. JSON's structured data representation proves highly conducive for organizing the multifaceted information within credit applications, encompassing personal, financial, and employment details. Its human-readable format facilitates collaboration among developers, supporting ease of editing and understanding of the data model, while interoperability across various platforms ensures seamless data exchange. The flexibility of JSON aligns perfectly with the dynamic nature of credit application requirements, enabling straightforward modifications and additions.

MongoDB stands out as an optimal choice for processing JSON documents in credit applications due to its native support for JSON-like BSON format. The database's flexibility allows for dynamic schema adjustments, aligning well with the evolving nature of credit application forms. MongoDB's ability to handle hierarchical data structures, coupled with robust querying and indexing capabilities, ensures efficient retrieval and organization of complex credit application information. As a scalable solution, MongoDB accommodates growing volumes of credit data while maintaining performance. Its seamless integration with JavaScript and other popular programming languages, tools, and technologies (e.g., Spark, Kafka) enhances development workflows, while features such as document validation and support for open banking standards further contribute to data integrity and standardized information exchange. In essence, MongoDB provides a versatile and efficient platform for storing and processing JSON documents that are highly suited for the nature of online credit applications.

Enhance credit scoring with AI

Leveraging MongoDB’s developer data platform — an integrated suite of data services centered around a cloud database — we can create a comprehensive customer/user banking profile by combining relevant data points.

Below, we will show you how it can be done. Here is an architectural diagram of the data processing pipeline for the predicting probability of delinquency and credit scoring:

An illustration depicting the risk profile computation pipeline

The data pipeline for credit scoring a customer involves the following steps:

  1. Data collection: The process begins with collecting data from various sources such as credit bureaus, open banking, fraud detection systems, and other relevant sources.
  2. Data processing: The collected data is processed using tools like Spark Streaming Connectors to create a unified view of the customer’s financial profile and store the same data as a single view in MongoDB Atlas. Further, these data points can be converted to features, as shown in the sample schema image above.
  3. Risk profile generation: From this unified view, risk profiles or product suggestions are generated. This involves using statistical methods to perform descriptive analytics and also artificial intelligence (AI) or machine learning (ML) techniques to identify patterns in the data to perform propensity scoring for risk.
  4. Model development: Various machine learning algorithms can be used for credit scoring and decisioning starting from logistic regression, decision trees, support vector machines, to neural networks. In this tutorial, for the sake of simplicity, we are employing the XGBoost (Extreme Gradient Boosted Trees) model — a state-of-the-art machine learning algorithm that’s renowned for its exceptional predictive performance. It’s a supervised learning method that is based on function approximation by optimizing specific loss functions and applying several regularization techniques. It’s widely used by data scientists to achieve exceptional results on many machine learning challenges. This model can handle high-dimensional data and capture complex patterns for classification and regression. The models also provide features important for supporting its inference outcome, which will come in handy in the current demonstration to explain the outcome of this predictive model.
  5. Data transformation: Before risk profile scoring is performed, the raw user data undergoes further processing involving data transformation/creation of new fields using Spark (or any similar managed analytics framework). Data is collated across multiple sources to create a single and materialized view of data, which can be derived from the MongoDB Atlas collection directly to be used in model development and also various descriptive analysis tasks. This step can also involve model inference.
  6. Decision collection: The final transformed data is then populated into a decision collection. This helps banks and financial institutions to support their financial decisions and tracking/auditing purposes.

The goal is to accurately assess the creditworthiness of a customer to make informed lending decisions and financial product recommendations. The pipeline is a demonstration of existing risk-scoring pipelines maintained by organizations.

Explain the credit application declination

When it comes to credit application declination, understanding the reasons behind it is crucial. Let’s explore how MongoDB and large language models (LLMs) can shed light on XGBoost model predictions (the model used in this tutorial).

Here is the architecture diagram explaining credit scoring using an LLM, e.g., OpenAI GPT.

Architecture diagram explaining credit scoring using an LLM

As explained in the earlier section, the risk profiling ML pipeline employed provides a probability score that defines the risk associated with the profile for product recommendation. This message is communicated back to the user in a templatized manner where only the final status of the application is communicated to the end user. In the proposed architecture with LLMs, prompt engineering can be utilized effectively to explain the reason for the final approved product status with valid reasons explained to the end customer.

Here, you can find the code and example responses. The code to generate a similar message can be done using Python in a Jupyter Notebook. The details on setting up MongoDB Atlas and fetching a connection string are available at this link.

Below is one example of a rejection explanation.

Example of a rejection explanation illustration

This sort of messaging to the customer can be categorized as a form of explainable AI where the features used in the model to perform risk profiling can be ranked and used as a part of the custom prompt to the LLM. This can help generate more descriptive reasons for the end customer to explain their user profile, as shown above. LLMs can also help summarize the list of descriptive reasons to provide a simplified view of the description. The application can then allow drill-downs to the details if the customer wants to find out more to enhance their experience.

Recommend alternative credit products

If the credit product applicant is declined, the credit institution should still try to cross-sell to the customer with a relevant product that meets their needs as they are already engaged in the process and application portal.

Financial institutions can implement a product recommendation system that provides a human-friendly explanation of the rationale for the new recommendation, which would open up new revenue opportunities that legacy systems today do not provide. Providing the rationales can create a more personalized relationship with clients and further increase the acceptance of the recommended product. Here is an example of a data architecture that is used to achieve this.

An illustration of data architecture featuring a product recommendation system

MongoDB Atlas Vector Search is a feature that allows you to perform semantic search and generative AI over any type of data. It integrates your operational database and vector search in a single, unified, and fully managed platform with a MongoDB native interface. You can create vector embeddings with machine learning models, then store and index them in MongoDB Atlas for retrieval augmented generation (RAG), semantic search, recommendation engines, dynamic personalization, and other use cases.

Retrieval-augmented generation (RAG) is a paradigm that uses vector search to retrieve relevant documents based on the input query. It then provides these retrieved documents as context to the LLMs to help generate a more informed and accurate response.

The tutorial above mentions technologies that can be used to solve a credit card product recommendation use case. The steps involved in the process are described below:

  1. Load private data: The credit card product of each financial institution varies in its offerings. These products change from time to time and so do the fees charged for various lifestyle benefits such as movie tickets, concierge services, etc. So storing product data in MongoDB as an operational data store (ODS) helps maintain changes yet builds Vector indexes alongside. The large data points can be suitably updated, deleted, inserted, or replaced according to the needs. The credit card product descriptions are very large, and hence, breaking them into smaller chunks helps retrieve relevant information accordingly. LLMs can be leveraged here to shrink the product description to product summaries that carry all the salient product features and costs, so it becomes easy to retrieve and recommend the relevant products.
  2. LLM powered recommendations: In this use case, the LLM is used as a recommender system where the user profile generated in the earlier stage can be used as an input to generate sub-queries that can be used to perform semantic similarity against the stored product vectors in MongoDB Atlas.
  3. Product recommendation with personalized messaging: The recommended products can then be used in a custom prompt to the LLM to generate relevant product recommendations summary for the end user. This greatly helps the financial institution personalize the recommendation and offer relevant recommendations to the end customer, thereby driving higher conversion rates. This pattern of product recommendation has been observed to increase customer engagement with the products and this increases user experiences and helps increase the "Likely to Recommend" of the products on offer.

Here, you can find the code and examples of alternative product recommendations. Below are a few examples. The code to generate a product recommendation and customize the product recommendation description can be performed using Python in a Jupyter Notebook.

An illustration depicting code used to generate a product recommendation

In conclusion, credit scoring is undergoing a transformative phase with the integration of gen AI. As we explore the dynamics of traditional models, challenges faced by borrowers, and the future envisioned with generative AI, it becomes evident that transparency, efficiency, and personalization are at the forefront of the evolving credit scoring landscape. The synergy of technology and financial acumen is shaping a future where credit decisions are not only accurate but also empowering for borrowers.

The code to demonstrate all the features of MongoDB for building such a solution is available in the following GitHub repo.

Key considerations

The proposed solution's functional and nonfunctional features include:

  • Understanding GenAI’s capabilities to synthesize diverse data sets to address the key limitations of traditional credit scoring models.
  • Through LLMs, prompt engineering can be utilized effectively to explain the reason for the credit status with valid reasons explained to the end customer.
  • Recognizing the challenges of traditional credit scoring models, which highlight the need for alternative credit scoring models that can adapt to evolving financial behaviors, handle non-traditional data sources, and provide a more inclusive and accurate assessment of creditworthiness.
  • Alternative data: Understanding the advantages of alternative data for more accurate credit scoring. This credit scoring model, for example, can be further augmented with alternative data points such as utility bills, mobile phone bills, education/certification history, etc.
  • Addressing hallucination: Mitigating hallucination risk by leveraging retrieval augmented generation (RAG) by grounding the model’s responses in factual information from up-to-date sources, ensuring the model’s responses reflect the most current and accurate information available.
Authors
  • Ashwin Gangadhar, Solutions Architect, Partner Solutions, MongoDB
  • Wei You Pan, Global Director, Financial Industry Solutions, MongoDB
  • Paul Claret, Senior Specialist, Industry Solutions, MongoDB
Related Resources
general_content_developer

GitHub Repository: Credit card application

Create this demo by following the instructions and associated models in this solution’s repository.

general_features_transactions

MongoDB for lending and leasing

Learn how MongoDB’s developer data platform supports a wide range of use cases in the lending and leasing space.

atlas_drivers

Green lending

Learn how MongoDB can support you to offer greener loans.

industry_automotive

Toyota Financial Services

Discover how Toyota is financing the next generation of mobility services with MongoDB.

Get started with Atlas

Get started in seconds. Our free clusters come with 512 MB of storage so you can experiment with sample data and get familiar with our platform.
Try FreeContact sales
Illustration of hands typing on a laptop in the foreground and a superimposed desktop window and coffee cup in the background.