Automate Digital Underwriting with Machine Learning

Industries: Insurance, Financial Services, Healthcare

Products and tools: Time Series, MongoDB Atlas Charts, MongoDB Connector for Spark, MongoDB Atlas Database, MongoDB Materialized Views, Aggregation Pipelines

Partners: Databricks

Solution Overview

This solution demonstrates how to use MongoDB, Machine Learning and real-time data processing to automate the digital underwriting process for connected cars. You can use this solution to offer customers personalized, usage-based premiums that take into account their habits and behaviors.

To do this, you'll need to gather data, send it to a machine learning platform for analysis, and then use the results to create personalized premiums for your customers. You’ll also visualize the data to identify trends and gain insights. This unique, tailored approach will give your customers greater control over their insurance costs and help you to provide more accurate and fair pricing.

The GitHub repo contains detailed, step-by-step instructions on how to load the sample data and build the transformation pipeline in MongoDB Atlas, as well as how to generate, send, and process events to and from Databricks.

By the end of this demo, you’ll create a data visualization with Atlas Charts that tracks automated insurance premium changes in near real time.

You can apply the concepts of this solution to other industries, including:

Financial Services: Banks and financial institutions must be able to make sense of time-stamped financial transactions for trading, fraud detection, and more.
Retail: Retailers require real-time insights into current market data.
Healthcare: From the modes of transportation to the packages themselves, IoT sensors enable supply chain optimization while in-transit and on-site.

Reference Architectures

The diagram below describes the architecture as follows:

An illustration shows a reference architecture

Figure 1. Reference architecture with MongoDB

First, load a dataset including the total distance driven in car journeys into MongoDB and run a daily cron job every day at midnight to summarize the daily trips. Then, compile the daily trips into a document stored in a new collection called customerTripDaily. Run a monthly cron job on the 25th day of each month, aggregating the daily documents and creating a new collection called customerTripMonthly. Every time a new monthly summary is created, an Atlas function posts the total distance for the month and baseline premium to Databricks for ML prediction. The ML prediction is then sent back to MongoDB and added to customerTripMonthly. As a final step, visualize all of your data with MongoDB Charts.

Data Model Approach

For this use case, a basic data model covers the customers, the trips they take, the policies they buy, and the vehicles insured by those policies.

This example builds three MongoDB collections, and two materialized views. You can find the full data model for defining MongoDB objects in the GitHub repository.

An illustration shows the MongoDB Data model approach

Figure 2. MongoDB data model approach

Build the Solution

To replicate this solution, check its GitHub repository. Follow the repository's README, which covers the following steps in more detail.

Create a data processing pipeline with a materalized view

The data processing pipeline component consists of sample data, a daily materialized view, and a monthly materialized view. A sample dataset of IoT vehicle telemetry data represents the motor vehicle trips taken by customers. It’s loaded into the collection named customerTripRaw. The dataset can be found on GitHub and can be loaded via mongoimport or other methods. To create a materialized view, a scheduled trigger executes a function that runs an aggregation pipeline. This then generates a daily summary of the raw IoT data and places it in a materialized view collection named customerTripDaily. Similarly for a monthly materialized view, a scheduled trigger executes a function that runs an aggregation pipeline that summarizes the information in the customerTripDaily collection on a monthly basis and places it in a materialized view collection named customerTripMonthly.

Check the following Github repos to create the data processing pipeline:

An illustration shows on how to create a data processing pipeline

Figure 3. Create a data processing pipeline

click to enlarge

Automate insurance premium calculations with a machine learning model

The decision-processing component consists of a scheduled trigger that collects the necessary data and posts the payload to a Databricks ML Flow API endpoint. This model was previously trained using the MongoDB Spark Connector on Databricks. It then waits for the model to respond with a calculated premium based on the monthly miles driven by a given customer. Then the scheduled trigger updates the customerPolicy collection to append a new monthly premium calculation as a new subdocument within the monthlyPremium array.

Check the following Github repos to create the data processing pipeline:

Step 4: Setup a calculate premium trigger.
Step 5: Setup the Databricks connection.
Step 6: Write the machine learning model prediction to MongoDB.

Figure 4. Automating calculations with machine learning model

click to enlarge

Visualize near real-time changes in insurance premiums over time

After monthly premium have been appended, you can set up Atlas Charts to visualize your calculated usage-based premiums. Configure different charts to see how premiums have changed over time to discover interesting patterns.

Key Learnings

Learn how to build materialized view on time series data: refer to steps 1-3 in the GitHub repo.
Leverage aggregation pipelines for cron expressions: refer to steps 2 or 3 in the GitHub repo.
Serve machine learning models with MongoDB Atlas data: refer to step 4 in the GitHub repo.
Write a machine learning model prediction to an Atlas database: refer to steps 5 and 6 in the GitHub repo.
Visualize near-real-time insights of continuously changing model results: refer to the Bonus step in the GitHub repo.

Authors

Jeff Needham, MongoDB
Ainhoa Múgica, MongoDB
Luca Napoli, MongoDB
Karolina Ruiz Rogelj, MongoDB

Learn More

Back

AI-Powered Call Center Intelligence

Agentic Claims Processing