Leaf in the Wild: Qumram Migrates to MongoDB to Deliver Single Customer View for Regulatory Compliance & Customer Experience

Mat Keep

#Customer Stories

Every financial services organization is tasked with two, often conflicting, priorities:

  1. The need for rapid digital transformation
  2. Implementing compliance controls that go beyond internal systems, extending all the way to their digital borders.

However, capturing and analyzing billions of customer interactions in real time across web, social, and mobile channels is a major data engineering challenge. Qumram solves that challenge with its Q-Suite portfolio of on-premise and cloud services.

Qumram’s software is used by the most heavily regulated industries in the world to help organizations capture every moment of the customer’s journey. Every keystroke, every mouse movement, and every button click, across all digital channels. Then store it for years. As you can imagine this generates an extraordinary volume and variety of data. Some Qumram customers ingest and store multiple terabytes of this sensitive data every day.

Starting out with relational databases, Qumram quickly hit the scalability wall. After extensive evaluation of alternatives, the company selected MongoDB to provide a single source of truth for all customer interactions across any digital channel.

I met with Simon Scheurer, CTO of Qumram AG, to learn more.

Can you start by telling us a little bit about your company?

Qumram provides a single view of all customer interactions across an organization’s digital channels, helping our customers to ensure compliance, prevent fraud, and enrich the experience they deliver to their users. Our unique session recording, replay, and web archival solution captures every user interaction across web, mobile, and social channels. This means that any user session can be replayed at a moment’s notice, in a movie-like form, giving an exact replica of the activity that occurred, when, and for how long. It’s pretty rare to provide a solution that meets the needs of compliance and risk officers while also empowering marketing teams – but that is what our customers can do with Q-Suite, built on modern technologies like MongoDB.

Q-suite Figure 1: Q-Suite recording of all digital interactions for regulatory compliance

Most of our customers operate in the highly regulated financial services industry, providing banking and insurance services. Qumram customers include UBS, Basler Kantonalbank, Luzerner Kantonalbank, Russell Investments, and Suva.

How are you using MongoDB?

Our solution provides indisputable evidence of all digital interactions, in accordance with the global regulatory requirements of SEC, US Department of Labor (DOL), FTC, FINRA, ESMA, MiFID II, FFSA, and more. Qumram also enables fraud detection, and customer experience analysis that is used to enhance the customer journey through online systems – increasing conversions and growing sales.

Because of the critical nature of regulatory compliance, we cannot afford to lose a single user session or interaction – unlike competitors, our system provides lossless data collection for compliance-mandated recording.

We use MongoDB to ingest, store, and analyze the firehose of data generated by user interactions across our customer’s digital properties. This includes session metadata, and the thousands of events that are generated per session, for example, every mouse click, button selection, keystroke, and swipe. MongoDB stores events of all sizes, from those that are contained in small documents typically just 100-200 bytes, through to session and web objects that can grow to several megabytes each. We also use GridFS to store binary content such as screenshots, CSS, and HTML.

Capturing and storing all of the session data in a single database, rather than splitting content across a database and separate file system massively simplifies our application development and operations. With this design, MongoDB provides a single source of truth, enabling any session to be replayed and analyzed on-demand.

You started out with a relational database. What were the challenges you faced there?

We initially built our products on one of the popular relational databases, but we quickly concluded that there was no way to scale the database to support billions of sessions every year, with each session generating thousands of discrete events. Also, as digital channels grew, our data structures evolved to become richer and more complex. These structures were difficult to map into the rigid row and column format of a relational schema. So in Autumn 2014, we started to explore non-relational databases as an alternative.

What databases did you look at?

There was no shortage of choice, but we narrowed our evaluation down to Apache Cassandra, Couchbase, and MongoDB.

What drove your decision to select MongoDB? We wanted a database that would enable us to break free of the constraints imposed by relational databases. We were also looking for a technology that was best-in-class among modern alternatives. There were three drivers for selecting MongoDB:

  1. Flexible data model with rich analytics Session data is richly structured – there may be up to four levels of nesting and over 100 different attributes. These complex structures map well to JSON documents, allowing us to embed all related data into a single document, providing us two advantages:

    1. Boosting developer productivity by representing data in the same structure as objects in our application code.
    2. Making our application faster as we only need issue a single query to the database to replay a session. At the same time, we need to be able to analyze the data in position, without the latency of moving it to an analytics cluster. MongoDB’s rich query language and secondary indexes allow us to access data by single keys, ranges, full text search, graph traversals, and geospatial queries, through to complex aggregations.
  2. Scalability The ability to grow seamlessly by scaling the database horizontally across commodity servers deployed on-premise and in the cloud, while at the same time maintaining data consistency and integrity.

  3. Proven We surveyed customers across our target markets, and the overwhelming feedback was that they wanted us to use a database they were already familiar with. Many global financial institutions had already deployed MongoDB and didn’t want to handle the complexity that came from running yet another database for our application. They knew MongoDB could meet the critical needs of regulatory compliant services, and that it was backed by excellent technical support, coupled with extensive management tools and rock-solid security controls.

As a result, we began development on MongoDB in early 2015.

How do your customers deploy and integrate your solution?

We offer two deployment models: on-premise and as a cloud service.

Many of the larger financial institutions deploy the Q-Suite with MongoDB within their own data centers, due to data sensitivity. From our application, they can instantly replay customer sessions. We also expose the session data from MongoDB with a REST API, which allows them to integrate it with their back-office processes, such as records management systems and CRM suites, often using message queues such as Apache Kafka.

We are also rolling out the Q-Suite as a “Compliance-as-a-Service” offering in the cloud. This option is typically used by smaller banks and insurers, as well the FinTech community.

How do you handle analytics against the collected session data?

Our application relies heavily on the MongoDB aggregation pipeline for native, in-database analytics, allowing us to roll-up session data for analysis and reporting. We use the new$graphLookup operator for graph processing of the session data, identifying complex relationships between events, users, and devices. For example, we can detect if a user keeps returning to a loan application form to adjust salary in order to secure a loan that is beyond his or her capability to repay. Using MongoDB’s in-built text search along with geospatial indexes and queries, we can explore session data to generate behavioral insights and actionable fraud intelligence.

Doing all of this within MongoDB, rather than having to couple the database with separate search engines, graph data stores, and geospatial engines dramatically simplifies development and ongoing operations. It means our developers have a single API to program against, and operations teams have a single database to deploy, scale, and secure.

I understand you are also using Apache Spark. Can you tell us a little more about that?

We use the MongoDB Connector for Apache Spark to feed session data from the database into Spark processes for machine learning, and then persist the models back into MongoDB. We use Spark to generate user behavior analytics that are applied to both fraud detection, and for optimization of customer experience across digital channels.

We are also starting to use Spark with MongoDB for Natural Language Processing (NLP) to extract customer sentiment from their digital interactions, and other deep learning techniques for anti-money laundering initiatives.

What does a typical installation look like?

The minimum MongoDB configuration for Q-Suite is a 3-node replica set, though we have many customers running larger MongoDB clusters deployed across multiple active data centers for disaster recovery and data locality. Most customers deploy on Linux, but because MongoDB is multi-platform, we can also serve those institutions that run on Windows.

We support both MongoDB 3.2 and the latest MongoDB 3.4 release, which gives our users the new graph processing functionality and faceted navigation with full text search. We recommend customers use MongoDB Enterprise Advanced, especially to access the additional security functionality, including the Encrypted storage engine to protect data at rest.

For our Compliance-as-a-Service offering, we are currently evaluating the MongoDB Atlas managed service in the cloud. This would allow our teams to focus on the application, rather than operations.

What sort of data volumes are you capturing?

Capturing user interactions is a typical time-series data stream. A single MongoDB node can support around 300,000 sessions per day, with each session generating up to 3,000 unique events. To give an indication of scale in production deployments, one of our Swiss customers is ingesting multiple terabytes of data into MongoDB every day. Another in the US needs to retain session data for 10 years, and so they are scaling MongoDB to store around 3 trillion documents.

Of course, capturing the data is only part of the solution – we also need to expose it to analytics, without impacting write-volume. MongoDB replica sets enable us to separate out these two workloads within a single database cluster, simultaneously supporting transaction and analytics processing.

Funnel metrics Figure 2: Analysis of funnel metrics to monitor customer conversion through digital properties

How are you measuring the impact of MongoDB on your business?

Companies operating in highly regulated industries, from financial services to healthcare to communications, are facing a whole host of new government and industry directives designed to protect digital boundaries. The Q-Suite solution, backed by MongoDB, is enabling us to respond to our customers’ compliance requirements. By using MongoDB, we can accelerate feature development to meet new regulatory demands, and implement solutions faster, with lower operational complexity.

The security controls enforced by MongoDB further enable our customers to achieve regulatory compliance.

Simon, thanks for sharing your time and experiences with the MongoDB community

To learn more about cybersecurity and MongoDB, download our whitepaper Building the Next Generation of Threat Intelligence with MongoDB