AI, Vectors, and the Future of Claims Processing: Why Insurance Needs to Understand The Power of Vector Databases

Oliver Tree, Jeff Needham, and Luca Napoli
October 4, 2023 | Updated: February 1, 2024

We’re just under a year since OpenAI released ChatGPT, unleashing a wave of hype, investment, and media frenzy around the potential of generative AI to transform how we do business and interact with the world. But while the majority of the investment dollars and media attention zeroed in on the disruptive capabilities of large language models (LLMs), there’s a crucial component underpinning this breakthrough technology that hasn’t received the attention it deserves; the humble vector database.

Vector databases, a type of database that stores numeric representations (or vectors) of your data, allow advanced machine learning algorithms to make sense of unstructured data like images, sound, or unstructured text, and return relevant results.

(You can read more about vector search databases and vector search on our Developer Hub.)

For industries dealing with vast amounts of data, such as insurance, the potential impact of vector databases and vector search is immense. In this blog, we will focus on how vectors can speed up and increase the accuracy of claim adjustment.

Check out our AI resource page to learn more about building AI-powered apps with MongoDB.

The claims process… vectorized!

The process of claim adjustment is time-consuming and error-prone. As one insurance client recently told us, “If an adjuster touches it, we lose money.” For each claim, adjusters need to go through past claims from the client and related guidelines, which are usually scattered across multiple systems and formats, making it difficult to find relevant information and time-consuming to produce an accurate estimate of what needs to be paid.

For this blog, let’s use the example of a car accident claim.

In our example, a car just crashed into another vehicle. The driver gets out and starts taking pictures of the damage, uploading them to their car insurance app, where an adjuster receives the photos. Typically, the adjuster would painstakingly comb through past claims and parse guidelines to work up an estimate of the damage and process the claim.

But with a vector database, the adjuster can simply ask an AI to “show me images similar to this crash,” and a Vector Search-powered system can return photos of car accidents with similar damage profiles from the claims history database. The adjuster is now able to quickly compare the car accident photos with the most relevant ones in the insurer's claim history.

What’s more, with MongoDB it is possible to store vectors as arrays alongside existing fields in a document. In our car crash scenario, this means that our fictional adjuster can not only retrieve the most similar pictures but also have access to complementary information stored in the same database: claim notes, loss amount, car model, car manufacturing year, etc. The adjuster now has a comprehensive view of past accidents and how they were handled by the insurance company, in seconds.

For this use case, we have focused on image search, but most data formats can be vectorized, including text and sound. This means that an adjuster could query using claim notes and find similar notes in the claim history or related paragraphs in the guidelines.

Vector Search is an extremely powerful tool as it unlocks access to unstructured data that was previously hard to work with such as PDFs, images, or audio files.

How does this work in practice? Let’s go through each step of the process:

A search index is configured on an existing collection in MongoDB Atlas
An image set is sent to an embedding model that generates the image vectors
The vectors are then stored in Atlas, alongside the current metadata found in the collection

A Dataset of photos of past accidents is vectorised and stored in Atlas. Claims history images feeds into the Embedder. From there, utilizing vectors, the images are put into a vectorized images collection in MongoDB Atlas — Figure 1: A dataset of photos of past accidents is vectorised and stored in Atlas

We run our query against the existing database and Vector Search returns the most similar images

An image similarity query is performed, and the 5 top similar images are returned. The query image, in this case a photo of a shattered windshield, flows into the embedder. Using a vector, it is then pulled into vector search and then into a vectorized images collection, where it is there compared with the top 5 similar images. — Figure 2: An image similarity query is performed, and the 5 top similar images are returned.

Example user interface: A claim-adjuster dashboard leveraging Vector Search

Screenshot of the example Claim Processing Dashboard. Features the active claim about a broken windshield, and displays 5 similar claims to compare the active claim against — Figure 3: UI of the claim adjuster application

We can go a step further and use our vectors to provide an LLM with the context necessary to generate more reliable and accurate outputs, also known as Retrieval Augmented Output (RAG).

These outputs can include:

Natural language processing for tasks such as chatbots and question-answering — think of a claim adjuster that interacts with a conversational interface and asks questions such as: “Give me the average of the loss amount for accidents related to one of the photos of claim XYZ” or “Summarize the content of the guidelines related to this accident”
Computer vision and audio processing for image classification and object detection to speech recognition and translation
Content generation, including creating text-based documentation, reports, and computer code, or converting text to an image or video

Figure 4 brings together the workflow enabling RAG for the LLM.

Figure 4: Dynamically combining your custom data with the LLM to generate reliable and relevant outputs

If you’re interested in seeing how to do this in practice and start prototyping, check out our GitHub repository and dive right in!

Go hands-on!

Vector databases and vector search will transform how insurers do business. In this blog we have explored how vectors can be leveraged to speed up the work of claim adjusters, which directly translates to an improved customer experience and, crucially, cost savings through faster claims processing and enhanced accuracy.

Elsewhere, vector search could be used for:

Enhanced customer service. Imagine being able to instantly pull up comprehensive policyholder profiles, their claims history, and any related information with a simple search. Vector search makes this possible, facilitating better interactions and more informed decisions.
Personalized Recommendations. As AI-driven personalization becomes the gold standard, vector search aids in accurately matching policyholders with tailor-made insurance products and services that meet their unique needs.
Scaled AI Efforts. Scale AI implementations across the organization. From improving customer service chatbots to detecting fraudulent activities, vector-based models can handle tasks more efficiently than traditional methods.

Atlas Vector Search goes one step further. By unifying the operational database and vector store in a single platform, MongoDB Atlas turbocharges the process of building semantic search and AI-powered applications, empowering insurers to quickly build applications that take advantage of the value of your vast troves of data.

Find out why leading insurers trust MongoDB.

← Previous

Data Resilience with MongoDB Atlas

Data is the central currency in today's digital economy. Studies have shown that 43% of companies that experience major data loss incidents are unable to resume business operations. A range of scenarios can lead to data loss, yet within the realm of database technology, they typically fall under three main categories: catastrophic technical malfunctions, human error, and cyber attacks. A data loss event due to a catastrophic breakdown, human error, or cyber attack is not a matter of if, but a matter of when it will occur. Hence, businesses need to focus on how to avoid and minimize the effects as much as possible. Failure to effectively address these risks can lead to extended periods of downtime of a few hours or even a few weeks following an incident. The average cost of cyberattacks is a surprising $4.45 million, with some attacks costing in the hundreds of millions. Reputational harm is harder to quantify but no doubt real and substantial. The specific industry you're in might be subject to regulatory frameworks designed to counter cyber attacks. Businesses that are subject to regulatory regimes must maintain compliance with these requirements. This can determine the configuration of your disaster recovery approach. In this blog post, we'll explain the key disaster recovery (DR) capabilities available with MongoDB Atlas . We'll also cover the core responsibilities and strategies for data resilience including remediation, and recovery objectives (RTO/RPO). Planning for data resilience in Atlas Data resilience is not a one-size-fits-all proposition, which is why we offer a range of choices in Atlas for a comprehensive strategy. Our sensible defaults ensure you're automatically safeguarded, while also offering a variety of choices to precisely align with the needs of each individual application. When formulating a disaster recovery plan, organizations commonly begin by assessing their recovery point objective (RPO) and recovery time objective (RTO). The RPO specifies the amount of data the business can tolerate losing during an incident, while the RTO indicates the speed of recovery. Since not all data carries the same urgency, analyzing the RPO and RTO on a per-application basis is important. For instance, critical customer data might have specific demands compared to clickstream analytics. The criteria for RTO, RPO, and the length of time you need to retain backups will influence the financial and performance implications of maintaining backups. With MongoDB Atlas, we provide standard protective measures by default, with customizable options for tailoring protection to the service level agreements specified by the RPO and RTO in your DR plan. These are enhanced by additional features that can be leveraged to achieve greater levels of availability and durability for your most vital tasks. These features can be grouped into two main categories: prevention and recovery. Backup, granular recovery, and resilience There are many built-in features that are designed to prevent disasters from ever happening in the first place. Some key features and capabilities that enable a comprehensive prevention strategy include multi-region and multi-cloud clusters , encryption at rest , Queryable Encryption , cluster termination safeguards , backup compliance protocols , and the capability to test resilience . (We will discuss the features in-depth in part two of this series.) While prevention might satisfy the resilience needs of certain applications, different applications may demand greater resilience against failures based on the business requirements of data protection and disaster recovery. MongoDB provides comprehensive management of data backups, including the geographic distribution of backups across multiple regions, and the ability to prevent backups from being deleted, all through an automated retention schedule. Recovery capabilities are aimed at supporting RTO and minimizing data loss and include continuous cloud backups with point-in-time recovery. Atlas cloud backups utilize the native snapshot feature of your cluster's cloud service provider, ensuring backup storage is kept separate from your MongoDB Atlas instances. Backups are essentially snapshots that capture the condition of your database cluster at a specific moment. They serve as a safeguard in case data is lost or becomes corrupted. For M10+ clusters, you have the option of utilizing Atlas Cloud Backups, which leverage the cluster's cloud service provider for storing backups in a localized manner. Atlas comes with strong default backup retention of 12 months out of the box. You also have the option to customize snapshot and retention schedules, including the time of day for snapshots, the frequency at which snapshots are taken over time, and retention duration. Another important feature is continuous cloud backup with point-in-time recovery, which enables you to restore data to the moment just before any incident or disruption, such as a cyber attack. To ensure your backups are regionally redundant and you can still restore even if the primary region that your backups are in is down, MongoDB Atlas offers the ability to copy these critical backups, with the point-in-time data, to any secondary region available from your cloud provider in Atlas. For the most stringent regulations, or for businesses that want to ensure backups are available even after a bad actor or cyber attack, MongoDB Atlas can ensure that no user, regardless of role, can ever delete a backup before a predefined protected retention period with the Backup Compliance Policy. Whatever your regulatory obligations or business needs are, MongoDB Atlas provides the flexibility to tailor your backup settings for requirements. Crucially, this ensures you can recover quickly, minimizing data loss and meeting your RPO in the event of a disaster recovery scenario. When properly configured, testing has shown that Atlas can quickly recover to the exact timestamp before a disaster or failure event, giving you a one-minute RPO and RTO of less than 15 minutes when utilizing optimized restores. Recovery times can vary due to cloud provider disk warming and which point in time you are restoring to. So, it is important to also test this regularly. This means that regardless of your regulatory or business requirements, MongoDB Atlas allows you to configure your backups to ensure that you can meet your recovery requirements and, most importantly, recover with precision and speed to ensure that your data loss is minimal and your recovery point objectives are met should you experience a recovery event. Conclusion As regulations and business needs continue to evolve, and cyber-attacks become more sophisticated and varied, creating and implementing a data resilience strategy can be simple and manageable. MongoDB Atlas comes equipped with built-in measures that deliver robust data resilience at the database layer, ensuring your ability to both avoid incidents and promptly restore operations with minimal data loss if an incident does occur. Furthermore, setting up and overseeing additional advanced data resilience features is straightforward, with automation driven by a pre-configured policy that operates seamlessly at any scale. This streamlined approach supports compliance without the need for manual interventions, all within the MongoDB Atlas platform. For more information on the data resilience and disaster recovery features in MongoDB Atlas, download the Data Resilience Strategy with MongoDB Atlas whitepaper. To get started on Atlas today, we invite you to launch a free tier today .

October 3, 2023

Next →

Welcome to the (Tech) Olympics!

Welcome to the Tech Olympics, where code meets competition! With the 2024 Summer Olympics starting today, we thought it’d be fun to imagine developers as athletes, showcasing their skills in a series of thrilling events. From relay races to coding challenges, the Tech Olympics would bring together the brightest minds in tech for a competition like no other. Whether you're a coding wizard, a bug-squashing maestro, or an AI aficionado, there would be something to test your limits and celebrate your talents. Opening ceremony The opening ceremony is one of the most iconic aspects of the Olympics. From the lighting of the torch to performances by local artists, the opening ceremony encapsulates the spectacle of the games, and is a necessity for the Tech Olympics. The Tech Olympics opening ceremony would kick off with a grand procession of teams involved, adorned in attire representing their area of expertise. Next, there’d be a performance by artists and developers using augmented and virtual reality to blend art with cutting-edge technology. Finally, there would be the lighting of the torch, but instead of the flame being run across the country, an application would be written and passed between developers from around the world that, when run, would light the torch and start the games. Now that we’ve kicked off the Tech Olympics, let's consider what its events might look like. Code sprint relay The "code sprint relay" would be a collaborative coding event where teams of developers would tackle a series of coding challenges in relay format. The twist would be that each member could only code for a set period (say, 5-10 minutes) before handing the code off to the next person. This setup requires clear communication and strategic planning, as each coder must quickly understand and build upon their predecessor's work. Code sprint relay challenges would range from algorithm problems to debugging tasks, demanding various skills and swift adaptability. This event would be fast-paced and dynamic, with a lively atmosphere filled with the buzz of coding and quick exchanges of ideas. Success would be measured not only by the completion of challenges but also by the efficiency and quality of the code, making this event a test of teamwork and technical skill under pressure. Security capture the flag Capture the flag might seem more like a kids’ game than an Olympic event, but trust us, there’d be nothing childish about this event. The "security capture the flag" event would be an exciting cybersecurity competition in which participants would need to solve security-related challenges to capture hidden "flags." These challenges would range from web application exploits and reverse engineering, to cryptographic puzzles and network forensics. Working in teams, participants would race against the clock to uncover vulnerabilities, exploit them, and find the embedded flags within a controlled, simulated environment. At the end, a debriefing session would highlight the most innovative solutions and techniques used. Success would be measured by the number of flags captured and the ingenuity of the approaches, showcasing participants' technical skills and strategic thinking under pressure. Bug hunt Have you ever built out your code and then, upon running it, realized that you made a mistake? If you have, you’ll understand just how intense this next event could be! The "bug hunt challenge" would be a fast-paced competition in which participants are tasked with finding and fixing bugs within a complex codebase. Each individual would be given the same software project with numerous hidden bugs, ranging from simple syntax errors to intricate logical flaws. Participants must use their debugging skills and tools to identify and resolve as many issues as possible within a set time limit. The event would be marked by intense focus and strategic problem-solving as competitors meticulously comb through the code. An automated system would verify the fixes instantly, ensuring accuracy and efficiency. Success would be measured by the quantity and severity of bugs resolved, along with the quality of the fixes, making this event a test of attention to detail and technical proficiency. AI arena We’d be remiss not to include an AI event! The "AI arena" event would be a competitive showcase where participants create machine learning models using a provided dataset to solve a specified problem. Teams would have several hours to analyze the data, create features, and train their models. The objective would be to develop a model with the highest accuracy and performance, balancing technical innovation with practical application. In the end, teams would present their models to judges, explaining their methodologies and challenges faced. Judging criteria would include model accuracy, creativity, and clarity of the presentation, making this event a comprehensive test of technical and communication skills. Location Finally, you can’t have an Olympics without a city to host it. There are plenty of tech hubs to choose from—San Francisco, London, Beijing—but we thought it’d be more fun to pick a growing tech hub like Ha Noi, Vietnam, as our location. Vietnam had the highest digital economy growth in Southeast Asia in 2022 , putting it on the path to be named alongside other “tech giant” cities. Also, Vietnamese food is excellent! During the games, local startups and tech companies would showcase their work on the world stage, and visiting developers would see the innovations that Vietnamese companies are working on. Sadly, there won’t be an actual Tech Olympics this year, but maybe in the future, there will be. An event that will bring the world's best developers together to showcase their skills, foster friendly competition, and allow the world to see just how amazing developers are. If you have some ideas about other events you would want to see at a Tech Olympics, connect with us on X (Twitter) and let us know what your ideas are. Interested in learning more about or connecting more with MongoDB? Join our MongoDB Community to meet other community members, hear about inspiring topics, and receive the latest MongoDB news and events.

July 26, 2024