MongoDB and BigID Deliver Scalable Data Intelligence for Enterprise Data
December 14, 2020 | Updated: October 21, 2022
Data never sleeps. Every time someone clicks on a site or sign-up for a newsletter, bytes are created or tracked, in the staggering amount of more than 2.5 quintillion every day. All that data is a key business driver and the foundation for decision-making, but most enterprises aren’t even aware of everything they’re collecting, or where it’s stored. As global protection and privacy regulations evolve and the sheer amount of data grows, they’re struggling to keep up.
As consumers, we’d like to think our data is safe. Companies like BigID are helping businesses discover and manage sensitive, personal, and critical data across the entire data ecosystem – and take action for privacy, protection, and perspective.
I had the opportunity to discuss the BigID platform and the data technology behind it with Eyal Sacharov, BigID’s VP Research & Chief Architect and Oren Ashkenazy, Head of DevOps.
AD: Eyal, Oren, thanks for joining me for this discussion. Can you tell us a bit about yourself and the genesis of the company?
Eyal Sacharov: I led the R&D team lead at the founding of BigID. As part of that team, I was responsible for the design and implementation of the product, including the evaluation of relevant technologies. My role has now evolved to BigID's Chief Architect and VP Research, where I oversee the data science team, product integration and customer-driven technology innovation.
Oren Ashkenazy: I’m Head of DevOps at BigID, and have a real passion for development technologies and scalable architecture.
The genesis of BigID in 2016 was based on the realization that enterprises were struggling to safeguard the sensitive data they collect and process on individuals, on customers, on employees, on clients — even as they looked to transition to being data-driven organizations. Founded by Dimitri Sirota and Nimrod Vax, who came at the challenge with decades of experience in enterprise information security and access management, BigID saw the opportunity to provide an accounting-like framework to help organizations better understand, protect, and responsibly utilize the data they collect and process.
AD: For anyone who isn’t familiar with BigID yet, could you describe why you set out to build this and the problem it’s solving?
ES & OA: Our initial intent was to address the challenge that you can’t protect or govern what you can’t find. This challenge became particularly acute when global privacy and protection regulations emerged, especially the EU GDPR. GDPR required enterprises to understand and report on not just what data they have, but whose it is and how it’s being collected and processed. The requirements for enterprises to be more transparent and accountable in how they collect and use personal information have made the demand for technology like BigID’s data intelligence platform more urgent.
When we were starting BigID, we focused on extending beyond traditional discovery approaches built for finding and enumerating a specific set of identifiers (with highly variable degrees of accuracy and efficiency). We looked to go one level deeper and measure how connected and related that data is to a specific person — that’s what defines it as personal.
Gaining that understanding into what is personal data or information is not only important for operationalizing at scale and automating privacy requirements for data access rights and accountability, but crucial for any enterprise that wants to maintain brand trust.
Enterprises are building their future on data. Customers, consumers, partners, and employees need to be assured that they can trust enterprises to not only safeguard their data, but to use it in ways that are consistent with their expectations.
AD: How would you describe the platform and the unique advantages that BigID gives its customers?
ES & OA: We created the first product purpose-built platform for the kind of data discovery required for privacy. Designed using a micro-services architecture that leverages correlation and machine learning for inference and model augmentation, the architecture allows us to scale to support the large volumes of data that enterprises are collecting and processing. By doing so, we also fashioned the first platform that was able to capture context around data — for example,whom it belonged, or whether there was an associated permission, or who had access to it..
Some of these are essential for privacy. But they also play a larger role in providing deeper insights into the what, where, who, why, and when of how data was collected, processed, and shared. As the BigID foundation has evolved to encompass cataloging, classification, and correlation, the platform has delved beyond discovery alone.
Today, BigID is the most comprehensive platform in the market to provide organizations insight and intelligence on their most important assets: the sensitive, personal, and critical data that they collect and process.
AD: How did you land on MongoDB to help you solve these challenges?
ES & OA: As we were architecting our platform, we wanted to ensure that we could scale effectively both horizontally and vertically, and to ensure high-performance for indexing and lookups to support our correlation and ML-based approach. From a technology perspective, we were looking for a modern database system that would be easy to install and maintain, and that wouldn't require our customers and partners to build in-house expertise in order to deploy..
MongoDB appealed to us as an option because of the well-balanced, general purpose approach. It has a good set of capabilities that we couldn’t find with other databases, such as the aggregation framework, solid graph capabilities, lookups, and full text indexing. This allows us to use it for various microservices, both as a standard persistence layer as well as a more advanced computational server.
For example, we make extensive use of the aggregation framework across the product to support what we describe as our discovery in depth capabilities that span correlation, multiple forms of classification, and cataloging. Native support for JSON was compelling in this regard as well.
Scaling features, such as replica sets and sharding, also gave us confidence we could address customer requirements.
The flexible schema for documents suited our collection requirements. Our intent from the outset was to support the full range of data sources we encountered at the customer level; the flexible schema allows us to adjust collection for each of the data connectors based on the specific set of fields for each data source.
From a deployment perspective, we liked the ease of installing. In particular, MongoDB is simple to deploy with Docker, which is important for us as BigID itself is Dockerized. Many of our customers have their own MongoDB servers, which makes our deployment easier.
MongoDB also offers a range of extended tools with their commercial products, such as Encrypted Storage and LDAP support. These add value to our customers and ensure that their most stringent requirements are met from the database creator.
We were also confident that the rate of adoption and large developer community would ensure a robust roadmap and support. And, our experience with support services and professional consulting in the initial stages was top notch.
AD: What advice would you give someone who is considering using MongoDB for their next product?
ES & OA: We recommend that you explore and evaluate MongoDB in detail. MongoDB is a very strong platform with some powerful features and tools that should be mastered. Make sure to conduct a comprehensive PoC for various use cases relevant for your next product. And in any case, invest in designing the data model upfront to ensure that the appropriate considerations and priorities are in scope. Consider factors like how frequently data updates will be performed, concurrency requirements, and the relative distribution of read-intensive vs. write operations.
AD: Where have you deployed MongoDB? On-premises, in the cloud, via MongoDB Atlas?
ES & OA: We’ve deployed largely on-prem, but a growing number of our customers are opting for MongoDB Atlas. Another advantage of MongoDB is that we can maintain a persistence layer between on-prem and cloud instances.
MongoDB Atlas is also an integral component of the SaaS service we are planning to launch later this year. Having already committed to MongoDB for our core design, Atlas was an easy choice. It stands out not only for f the depth of its cloud service platform support, but also for the assurance of version compatibility as new releases roll out.
MongoDB’s native monitoring tools are even more powerful for us in optimizing uptime and performance for a service, relative to the value they provide when BigID is deployed in the customers’ data centers or cloud. MongoDB’s real-time monitoring also provides us with live information on the cluster performance, from hottest collection to frequently used resources to the number of open connections and levels of network utilization. As part of database tuning, the Atlas profile tool is a great feature that helps us to find slow queries that need to be optimized and to detect performance bottlenecks.
AD: What tools are you using to deploy and monitor MongoDB?
ES & OA: Monitoring our MongoDB clusters is an essential part of being production-ready and ensuring our systems are operating at optimal health. We monitor performance spikes to get a solid view of all of the metrics for the underlying hardware and operating system.
As far as monitoring tools go, for deployment we are using HashiCorp Terraform MongoDB Atlas Provider, which is the official plugin approved and tested by HashiCorp. It enables us to add and create MongoDB Atlas clusters to our Terraform environment and then peer them to the BigID application running on Amazon EKS (Elastic Kubernetes Service).
AD: Thanks for an informative conversation, Eyal and Oren! We appreciate your time and your partnership.
To learn more about our joint solution, or if you’re evaluating MongoDB Atlas, reach out to ISV@MongoDB.com for more information.
BigID’s data intelligence platform enables organizations to know their enterprise data and take action for privacy, protection, and perspective. By applying advanced machine learning and deep data insight, BigID transforms data discovery and data intelligence to address data privacy, security, and governance challenges across all types of data, in any language, at petabyte-scale, across the data center and the cloud. BigID has raised $146 million in funding since its founding in 2016 and has been recognized for its data intelligence innovation as a 2019 World Economic Forum Technology Pioneer, named to the 2020 Forbes Cloud 100, a Business Insider 2020 AI Startup to Watch, and an RSA Innovation Sandbox winner. Find out more at http://bigid.com or visit us at http://bigid.com/demo to schedule a demo.
MongoDB is the leading modern, general purpose database platform, designed to unleash the power of software and data for developers and the applications they build. Headquartered in New York, MongoDB has more than 18,400 customers in over 100 countries. The MongoDB database platform has been downloaded over 110 million times and there have been more than one million MongoDB University registrations.