RegData & MongoDB: Streamline Data Control and Compliance
While navigating the requirements of keeping data secure in highly regulated markets, organizations can find themselves entangled in a web of costly and complex IT systems. Whether it's the GDPR safeguarding European personal data or the Monetary Authority of Singapore's guidelines on outsourcing and cloud computing , the greater the number of regulations organizations are subjected to, particularly across multiple geographical locations, the more intricate their IT infrastructure becomes, and organizations today face the challenge of adapting immediately or facing the consequences. In addition to regulations, customer expectations have become a major driver for innovation and modernization. In the financial sector, for example, customers demand a fast and convenient user experience with real-time access to transaction info, a fully digitized mobile-first experience with mobile banking, and personalization and accessibility for their specific needs. While these sorts of expectations have become the norm, they conflict with the complex infrastructures of modern financial institutions. Many financial institutions are saddled with legacy infrastructure that holds them back from adapting quickly to changing market conditions. Established financial institutions must find a way to modernize, or they risk losing market share to nimble challenger banks with cost-effective solutions. The banking market today is increasingly populated with nimble fintech companies powered by smaller and more straightforward IT systems, which makes it easier for them to pivot quickly. In contrast, established institutions often operate across borders, meaning they must adhere to a greater number of regulations. Modernizing these complex systems requires the simultaneous introduction of new, disruptive technology without violating any regulatory constraints, akin to driving a car while changing a tire. The primary focus for established banks is safeguarding existing systems to ensure compliance with regulatory constraints while prioritizing customer satisfaction and maintaining smooth operations as usual. RegData: Compliance without risk Multi-cloud application security platform, RegData embraces this challenge head-on. RegData has expertise across a number of highly regulated markets, from healthcare to public services, human resources, banking, and finance. The company’s mission is clear—delivering a robust, auditable, and confidential data protection platform within their comprehensive RegData Protection Suite (RPS), built on MongoDB. RegData provides its customers with more than 120 protection techniques , including 60 anonymization techniques, as well as custom techniques (protection of IBANs, SSNs, emails, etc), giving them total control over how sensitive data is managed within each organization. For example, by working with RegData, financial institutions can configure their infrastructure to specific regulations, by masking, encrypting, tokenizing, anonymizing, or pseudonymizing data into compliance. With RPS, company-wide reports can be automatically generated for the regulating authorities (i.e., ACPR, ECB, EU-GDPR, FINMA, etc.). To illustrate the impact of RPS, and to debunk some common misconceptions, let’s explore before and after scenarios. Figure 1 shows the decentralized management of access control. Some data sources employ features such as Field Level Encryption (FLE) to shield data, restricting access to individuals with the appropriate key. Additionally, certain applications implement Role-Based Access Control (RBAC) to regulate data access within the application. Some even come with an Active Directory (AD) interface to try and centralize the configuration. Figure 1: Simplified architecture with no centralized access control However, each of these only addresses parts of the challenge related to encrypting the actual data and managing single-system access. Neither FLE nor RBAC can protect data that isn’t on their data source or application. Even centralizing efforts like the AD interface exclude older legacy systems that might not have interfacing functionalities. The result in all of these cases is a mosaic of different configurations in which silos stay silos, and modernization is risky and slow because the data may or may not be protected. RegData, with its RPS solution, can integrate with a plethora of different data sources as well as provide control regardless of how data is accessed, be it via the web, APIs, files, emails, or others. This allows organizations to configure RPS at a company level. All applications including silos can and should interface with RPS to protect all of the data with a single global configuration. Another important aspect of RPS is its functions with tokenization, allowing organizations to decide which columns or fields from a given data source should be encrypted according to specific standards and govern the access to corresponding tokens. Thanks to tokenization, RPS can track who accesses what data and when they access it at a company level, regardless of the data source or the application. This is easy enough to articulate but quite difficult to execute at a data level. To efficiently manage diverse data sources, fine-grained authorization, and implement different protection techniques, RegData builds RPS on top of MongoDB's flexible and document-oriented database. The road to modernization As noted, to fully leverage RegData’s RPS, all data sources should go through the RPS. RPS works like a data filter, putting in all of the information and extracting protected data on the other side, to modernize and innovate. Just integrating RegData means being able to make previously siloed data available by masking, encrypting, or anonymizing it before sending it out to other applications and systems. Together, RegData and MongoDB form a robust and proven solution for protecting data and modernizing operations within highly regulated industries. The illustration below shows the architecture of a private bank utilizing RPS. Data can only be seen in plain text to database admins when the request comes from the company’s headquarters. This ensures compliance with regulations, while still being able to query and search for data outside the headquarters. This bank goes a step further by migrating their Customer Relationship Management (CRM), core banking, Portfolio Management System (PMS), customer reporting, advisory, tax reporting, and other digital apps into the public cloud. This is achieved while still being compliant and able to automatically generate submittable audit reports to regulating authorities. Figure 2: Private bank business care Another possible modernization scheme—given RegData’s functionalities—is a hybrid cloud Operational Data Layer (ODL), using MongoDB Atlas . This architectural pattern acts as a bridge between consuming applications and legacy solutions. It centrally integrates and organizes siloed enterprise data, rendering it easily available. Its purpose is to offload legacy systems by providing alternative access to information for consuming applications, thereby breaking down data silos, decreasing latency, allowing scalability, flexibility, and availability, and ultimately optimizing operational efficiency and facilitating modernization. RegData integrates, protects, and makes data available, while MongoDB Atlas provides its inherent scalability, flexibility, and availability to empower developers to offload legacy systems. Figure 3: Example of ODL with both RegData and MongoDB In conclusion, in a world where finding the right solutions can be difficult, RegData provides a strategic solution for financial institutions to securely modernize. By combining RegData's regulatory protection and modern cloud platforms such as MongoDB Atlas, the collaboration takes on the modernizing challenge of highly regulated sectors. Are you prepared to harness these capabilities for your projects? Do you have any questions about this? Then please reach out to us at firstname.lastname@example.org or email@example.com You can also take a look at the following resources: Hybrid Cloud: Flexible Architecture for the Future of Financial Services Implementing an Operational Data Layer
Safety Champion Builds the Future of Safety Management on MongoDB Atlas, with genAI in Sight
Safety Champion was born in Australia in 2015, out of an RMIT university project aiming to disrupt the safety management industry, still heavily reliant on paper-based processes and lagging in terms of digitisation, and bring it to the cloud. Most companies today need to comply with strict workplace safety policies. This is true for industries reliant on manual workers, such as manufacturing, transport and logistics, construction, and healthcare, but also for companies dealing with digital workers, both working in the office or remotely. To do this, organisations rely on safety management processes and systems that help them comply with government and industry-led regulations, as well as keep their employees safe. Whether it’s legal obligations about safety reporting, management of employees and contractors, or ways to implement company-wide safety programs, Safety Champion’s digital platform provides customers more visibility and tracking over safety programs, and a wealth of data to help make evidence-based safety decisions. "Data is core to our offering, as well as core to how next-generation safety programs are being designed and implemented. With paper-based processes, you simply can’t get access to rich data, connect data sets easily, or uncover organisation-wide insights and patterns that can help drive efficiencies and improve safety outcomes," explains Craig Salter, Founder of Safety Champion. MongoDB Atlas: Unlocking the power of data and analytics to improve safety outcomes for customers Safety Champion started using the self-managed version of MongoDB, and shortly after that in 2017 moved onto MongoDB Atlas which was more cost-effective, meant less overhead and not having to manage the day-to-day tasks required to keep a database up and running. The main challenge is that industry standards and policies around safety vary significantly for every company - the safety risks of an office-based business of digital workers are widely different from the risks workers on a manufacturing plant are exposed to, making data collection and itemisation for deeper insights very complex. MongoDB’s document model, its flexibility, and its ability to handle complex sets of data combined with Atlas’ ease of use for developers made it the perfect fit for Safety Champion. "It was very easy to get started on MongoDB, but also super easy and quick to get applications developed and brought to market," says Sid Jain, Solution Architect for Safety Champion. "The performance optimisation we saw using MongoDB Atlas was significant, and it freed up a lot of time from our developers so they could focus on what matters most to our business, instead of worrying about things like patching, setting up alerts, handling back-ups, and so on." The use of MongoDB Charts also gives Safety Champions’ customers access to important analytics that can be presented in visual forms, fitting very specific use cases and internal audiences. This helps organisations using Safety Champion improve decision-making by presenting very concrete data and graphs that can fuel evidence-based safety decisions. "MongoDB Atlas helps drive efficiencies for our clients, but it also helps safety and operations managers to have a voice in making important safety decisions because they are backed by strong data-led evidence. Connecting data sets means the ability to have a much deeper, richer view of what’s happening and what needs to be done," says Salter. Managing exponential growth: 2024, the year of scaling up, generative AI, Search, and much more Before 2020, Safety Champions was still a small start-up, with its platform managing about 5,000 documents a month - these include incident records, checklists, inspection reports, actionable tasks, task completion reports, and more. The COVID pandemic forced organisations to move their safety processes online and comply with a whole new set of safety measures and policies, which saw the company’s business explode: triple-digit annual growth between 2021 and 2023, a dev team that tripled in size, over 2,000 customers, and now up to 100,000 documents handled per month. "As our company kept growing, with some of our new customers handling tens of thousands of safety documents every month - we knew we needed to enable even more scale and future proof ourselves for the next years to come," explains Salter. "We also knew that if we wanted to take advantage of MongoDB’s capabilities in generative AI, search, multi-region, and more, which a lot of our customers were asking for, we needed to set some strong data foundations." Safety Champion is now in the process of upgrading to MongoDB 6.0, which will offer its clients more speed, especially when handling larger and more complex queries. MongoDB Search is now also available system-wide, allowing search queries to be performed across all the modules a client has added records for. "Since many modules allow linking records to each other, allowing a single search query to find and return records from multiple modules makes a world of difference. Developers no longer have to maintain other data systems and the extraction, transformation, and sync of data between MongoDB and search index happens seamlessly, greatly reducing the Ops burden on dev teams," explains Jain. The use of multi-region functionalities within MongoDB Atlas means customers, especially global ones operating in multiple geographic regions, will be able to segregate data and ensure they meet regulatory requirements around data hosting and security. Lastly, Safety Champion is exploring the potential of generative AI with plans to start using MongoDB Vector Search , later in 2024. Some of the use cases the company is already investigating include semantic insights, understanding textual data that employees enter in forms, applying LLMs to that data, and extracting helpful information from it. "Every client wants more analytics, more insights, and more high-level meaning out of data. It’s not just about making it easier to enter data and seeing safety incidents, it’s about what it means and decisions that can be made from a safety perspective," says Salter. "The new version of the Safety Champion platform powered by MongoDB Atlas means we are fully ready to dive into the next phase of our evolution as a business and offer features such as generative AI which will take both Safety Champions and our customers to the next era of safety management."
Spotlight on Two Aussie Start-Ups Building AI Services on MongoDB Atlas
Australian-based Eclipse AI and Pending AI are using the power of MongoDB Atlas to bring their AI ideas to life and blaze new trails in fields including pharmaceutical R&D and customer retention. With the recent advancements in the fields of AI and generative AI, innovation has been unleashed to new heights. Many organisations are taking advantage of technologies such as Natural Language Processing (NLP), Large Language Models (LLMs), and more to create AI-driven products, services, and apps. Amongst those that are blazing new trails in the AI space are two Australian start-ups: Pending AI , which is helping scientists and researchers in the pharmaceutical space improve early research & development stages, and Eclipse AI , a company that unifies and analyses omnichannel voice-of-customer data to give customers actionable intelligence to drive retention. What they have in common is their choice to use MongoDB Atlas . This multi-cloud, developer data platform unifies operational, analytical, and generative AI data services to streamline building AI-enriched applications. Here is how we are helping these two Australian start-ups create the next generation of AI products faster, with less complexity, and without breaking the bank. Pending AI improves pharmaceutical R&D by leveraging next-generation technologies Pending AI has developed a suite of artificial intelligence and quantum mechanics-based capabilities to solve critical problem statements within the earliest stages of pharmaceutical research and development. The Pending AI platform is capable of dramatically improving the efficiency and effectiveness of the compound discovery pipeline, meaning stakeholders can obtain better, commercially viable scaffolds for further clinical development in a fraction of the time and cost. Building its two artificial intelligence-based capabilities - Generative Molecule Designer and Retrosynthesis Engine - was a mammoth task. The known number of pharmacologically relevant molecules in chemical space is exceptionally large, and there are over 50 million known chemical reactions and billions of molecular building blocks - expert scientists have to undergo cost- and time-inefficient trial-and-error processes to design desired molecules and identify optimal synthesis routes to them. Pending AI needed a database that could handle a very large number of records, and be highly performant at that scale, as required by the vastness of chemical space. A few databases were considered by Pending AI, but MongoDB kept standing out as a battle-tested, reliable, and easy-to-implement solution, enabling Pending AI’s team to build a highly performant deployment on MongoDB Atlas. “As a startup, getting started with the community edition of MongoDB and being able to run a reliable cluster at scale was a huge benefit. Now that we’re starting to leverage the AWS infrastructure in our platform, MongoDB Atlas provides us with a fully managed solution at a low cost, and with a Private Endpoint between our AWS deployment and MongoDB cluster, we have kept latency to a minimum, and our data secure,” said Dr. David Almeida Cardoso , Vice President, Business Development at Pending AI. Output of Pending AI's Generative Molecule Designer Pending AI’s Generative Molecule Designer has been built as a machine learning model on MongoDB Atlas, trained to understand the language of pharmaceutical structures, which allows for automated production of novel compound scaffolds that can be focused and tailored to outputs of biological and/or structural studies. The Retrosynthesis Engine is also built using a set of machine learning models and MongoDB Atlas, trained to understand chemical reactions, which allows for the prediction of multiple, valid synthetic routes within a matter of minutes. “We’re also excited to explore the new Atlas Search index feature in MongoDB 7.0. We hope this will allow us to integrate some of the search functionality, which is currently complex to manage and maintain, directly into MongoDB, rather than relying on a separately maintained Elasticsearch cluster,” added Cardoso. Being part of the MongoDB AI Innovator program also allowed Pending AI to explore leveraging cloud infrastructure to scale its platform and test newer versions of MongoDB quickly and easily. Eclipse AI turns customer interaction insights into revenue Eclipse AI is a SaaS platform that turns siloed customer interactions from different sources - these can be customer calls, emails, surveys, reviews, support tickets, and more - into insights that drive retention and revenue. It was created to address the frustration of customer experience (CX) teams around the number of hours and man-weeks of effort needed to consolidate and analyse customer feedback data from different channels. Eclipse AI took on the challenge of solving this issue and worked hard to find a way to offer customers faster and more efficient ways to turn customer feedback into actionable insights. The first problem was consolidating the voice-of-customer data which was so fragmented; the second was analysing that data and turning it into specific improvement actions to improve the customer experience and prevent customer churn. Because MongoDB Atlas is a flexible document database that also can store and index vector embeddings for unstructured data, it was a perfect fit for Eclipse AI and enabled its small dev team to focus on building the product very efficiently and quickly, without being burdened with managing infrastructure. MongoDB Atlas also comes with key features such as MongoDB Atlas Device SDKs (formerly Realm) and MongoDB Atlas Search that were instrumental in bringing Eclipse AI’s platform to life. "For us, MongoDB is more than just a database, it is data-as-a-service. This is thanks to tools like Realm and Atlas Search that are seamlessly built into the platform. With minimum effort, we were able to add a relevance-based full-text search on top of our data. Without MongoDB Atlas we would not have been able to iterate quickly and ship new features fast,” commented Saad Irfani, co-founder of Eclipse AI. “Best of all, horizontal scaling is a breeze with single-click sharding that doesn't require setting up config servers or routers, reducing costs along the way. The unified monitoring and performance advisor recommendations are just the cherry on top.” Eclipse AI - MongoDB dashboard G2 rated Eclipse AI as the #1 proactive customer retention platform globally for SMEs, a recognition that wouldn’t have been possible without the use of MongoDB Atlas. Exploring your AI potential with MongoDB MongoDB Atlas is built for AI . Why? Because MongoDB specialises in helping companies and their developer teams manage richly structured data that doesn't neatly fit into the rigid rows and columns of traditional relational databases, and turn that into meaningful and actionable insights that help operationalise AI. More recently, we have added Vector Search - enabling developers to build intelligent applications powered by semantic search and generative AI over any type of data - and enhanced AWS CodeWhisperer coding assistant to our list of tools companies can use to further their AI exploration. These are just a handful of examples of what is possible in the realm of AI today. Many of our customers around the world, from start-ups to large enterprises like banks and telcos are investing in MongoDB Atlas and capabilities such as Atlas Search , Vector Search , and more to create what the future of AI and generative AI will look like in the next decade. If you want to learn more about how you can get started with your AI project, or take your AI capabilities to the next level, you can check out our MongoDB for Artificial Intelligence resources page for the latest best practices that get you started in turning your idea into an AI-driven reality.
Connected Vehicles: Accelerate Automotive Innovation With MongoDB Atlas and AWS
Capgemini's Trusted Vehicle Solution heralds a new era in driver and fleet management experiences. This innovative platform leverages car-to-cloud connectivity, unlocking a world of possibilities in fleet management, electric vehicle charging, predictive maintenance, payments, navigation, and consumer-facing mobile applications. Bridging the gap between disparate systems, Trusted Vehicle fast-tracks the development of software-defined vehicles and ushers in disruptive connectivity, autonomous driving, shared mobility, and electrification (CASE) technologies. In this post, we will explore how MongoDB Atlas and AWS work together to power Capgemini's Trusted Vehicle solution. What is Trusted Vehicle? Capgemini’s Trusted Vehicle solution accelerates time-to-market with a secure and scalable platform of next-generation driver and fleet-management experiences. Trusted Vehicle excels in fleet management, EV charging, navigation, and more while also accelerating software-defined vehicle development. By seamlessly connecting disparate systems, it paves the way for disruptive advancements in automotive technologies. AWS for Automotive empowers OEMs, mobility providers, parts suppliers, automotive software companies, and dealerships to effectively utilize AWS, providing them with tailored solutions and capabilities in many areas such as autonomous driving, connected mobility, digital customer engagement, software-defined vehicle, manufacturing, supply chain, product engineering, sustainability, and more. Based on its cloud mobility solution expertise and immense experience in successfully implementing Trusted Vehicle for its clients, Capgemini has developed repeatable and customizable modules for OEMs and mobility companies to accelerate their connected mobility journey. These quick-start modules can be swiftly customized for any organization by adding capabilities. Here are a few examples of the modules: Diagnostics trouble-code tracker for fleet maintenance that bolsters safety and efficiency Fleet management software with keyless vehicle remote control for convenience and security Predictive maintenance for connected vehicles to detect anomalies and ensure proactive interventions For automotive OEMs, innovation through digitization of their products and services is of paramount importance. The development of connected and smart vehicles requires cutting-edge technologies. Capgemini recognizes the significance of robust data platforms in shaping the future of connected vehicles. At the core of the Trusted Vehicle solution lies the MongoDB Atlas developer data platform. This strategic partnership and integration ensures that automotive OEMs can harness the power of a modern, scalable, and secure data platform, enabling efficiency, secure and robust connectivity, and seamless user experiences. Benefits of MongoDB Atlas for Capgemini Trusted Vehicle solution Faster time-to-market and developer velocity MongoDB Atlas’ core value proposition is to offer a unified data platform for developers to build applications. With MongoDB Atlas, Capgemini built the core data processing, from sensor data to valuable business insights, with one API. Limiting the number of infrastructure components helps developers spend less time writing orchestration code and the corresponding automated tests, setting up the infrastructure with all the disaster recovery requirements, and monitoring that stack. Absolving developers from those responsibilities allows them to deliver more features, bringing business value to the customers rather than spending precious time on technical plumbing. Cloud agnosticism and customized Trusted Vehicles for customers MongoDB Atlas is a fully managed database as a service that offers features like multi-cloud clusters, automated data tiering, continuous backups, and many more. With a multi-cloud cluster in MongoDB Atlas, customers can: use data from an application running in a single cloud and analyze that data on another cloud without manually managing data movement. use data stored in different clouds to power a single application. easily migrate an application from one cloud provider to another. Multi-cloud enables improved governance by accommodating customers who require data to be stored in a specific country for legal or regulatory reasons. It also allows for performance optimization by deploying resources in regions nearest to where users are located. Implementing Atlas for the Edge Atlas for the Edge provides a solution that streamlines the management of data generated across various sources at the edge, including connected cars and user applications. Two key components of this solution are Atlas Device Sync and SDKs . Together, they provide a fully managed backend that facilitates secure data synchronization between devices and the cloud. This also includes out-of-the-box network handling, conflict resolution, authentication, and permissions. To successfully implement MongoDB’s Atlas for the Edge solution, AWS Greengrass was used to facilitate over-the-air updates and manage the software deployment onto the vehicles, while Device Sync and SDKs handled the transmission of data from the car back to the cloud. Greengrass allows executing code through lambda functions, utilizing data received via MQTT or from the connected device. Device SDKs, however, overcome AWS Lambda's temporary file system storage limitation by offering a significantly enhanced data storage capacity. Greengrass can now locally store the telematics data in the database provided by the SDKs. Therefore, the data will be stored even if the device is offline. Following the restoration of network connectivity, the locally stored telematics data can be synchronized with the MongoDB Atlas cluster. The storage capabilities of the Device SDKs help ensure that processes run smoothly and continuously. Syncing telemetry data to Atlas Dynamic queries with flexible sync Device Sync lets developers control exactly what data moves between their client(s) and the cloud. This is made possible by flexible sync, a configuration that allows for the definition of a query in the client and synchronization of only the objects that match the query. These dynamic queries can be executed based on user inputs, eliminating developers' need to discern which query parameters to assign to an endpoint. Moreover, with Device SDKs, developers can integrate seamlessly with their chosen platform, directly interfacing with its native querying system. This synergy streamlines the development process for enhanced efficiency. Data ingest for IoT Data ingest , a sync configuration for applications with heavy client-side insert-only workloads facilitates seamless data streaming from the Trusted Vehicle software to a flexible sync-enabled app. This unidirectional data sync is useful in IoT applications, like when a weather sensor transmits data to the cloud. In the case of vehicles, information specific to each car — such as speed, tire pressure, and oil temperature — is transmitted to the cloud. Data ingest is also helpful in writing other types of immutable data where conflict resolution is unnecessary. This includes tasks like generating invoices through a retail application or logging events in an application. Data lifecycle management with Device Sync Atlas Device Sync completely manages the lifecycle of this data. Data ingest and flexible sync handles the writing and synchronization processes, including removing data that is no longer needed from devices. On-device storage, network handling, and conflict resolution ensure that clients retain data even when offline. Once reconnected to a network, data seamlessly and automatically synchronizes with MongoDB Atlas. Processing and accessing data with aggregation pipelines The raw data gathered from individual vehicles, like metrics such as speed, direction, and tire pressure, lacks meaningful interpretation on its own. MongoDB’s aggregation pipeline transforms these individual records into contextualized information like driver profiles, usage patterns, trip specifics, and more, yielding actionable insights. For optimal storage and performance efficiency, MongoDB automatically archives individual records after they are processed, ensuring they remain accessible for future retrieval. Overview of Atlas for the Edge - AWS architecture The implementation of Atlas for the Edge for Trusted Vehicle’s solution shifts the responsibility of collecting, syncing, and processing data from AWS components to Atlas Device Sync and SDKs. The Device SDK for Node.js is used in the lambda function, which runs as soon as the Greengrass core device boots up and stores the vehicle telematics data every two seconds in the Realm DB. Using flexible sync with data ingests, the vehicle will automatically sync the telemetry data from the device to the MongoDB Atlas cluster on AWS into a time series collection. An aggregated document representing the vehicle’s or drivers’ data can be computed with the aggregation pipeline and stored in a collection or as a materialized view and accessed via an API endpoint. Historical telemetric data that gets cold can be automatically archived into cold storage using Online Archive, native to the time series collection. This archived data is still accessible if needed on a specific API endpoint using the federated query feature of MongoDB Atlas. Trusted Vehicle with AWS and MongoDB Atlas MongoDB Atlas offers a trifecta of benefits when utilized within Capgemini's Trusted Vehicle solution. First, it accelerates time-to-market and enhances developer efficiency by streamlining and simplifying the technology stack. Second, MongoDB Atlas proves to be more cost-effective as the fleet of vehicles expands. The reduction in cost per vehicle, especially as scale reaches 1,000 and 10,000, results in a substantial decrease in the total cost of ownership. Keeping efficiencies of scale in mind, the OEMs running millions of cars on the road will certainly benefit from this solution. Third, MongoDB's cloud-agnostic components pave the way for a more flexible and adaptable implementation, breaking free from the constraints of specific cloud environments. Ultimately, MongoDB Atlas not only expedites development and reduces costs but also provides a more versatile solution catering to a wider range of clients. For more information on our partnership with Capgemini, please visit our partner webpage . Additionally, visit our MongoDB in Manufacturing and Automotive page to understand our value proposition for the automotive industry and take a look at our connected vehicle solution video .
Pledging Ourselves to the Future
As MongoDB’s sustainability manager, you could say I think about the climate a lot. After all, doing so is my job. But because it’s January and a time of reflection, I’ve been thinking about climate change more than usual — particularly about the progress we’ve made, but also the work that remains to be done. For example, in December the annual U.N. Climate Change Conference (COP 28) ended with a landmark agreement to transition away from fossil fuels, and the aim of reaching net zero carbon dioxide emissions by 2050. The COP 28 agreement also calls on countries to triple their renewable energy capacity and reduce other forms of emissions. The agreement was very welcome because before COP 28 began the U.N. released a stark report that showed national plans are, "insufficient to limit global temperature rise." As worried as I might be some days, I’m also buoyed by the climate action of the last few years. According to the U.S. Energy Information Administration, in 2022 more energy was generated by renewable sources than by coal for the first time. There have also been several regulations passed globally that make the measurement and disclosure of emissions mandatory, a key step in understanding — and reducing — emissions. MongoDB joins The Climate Pledge In the same spirit of optimism, I’m delighted to announce that MongoDB recently signed The Climate Pledge joining hundreds of leading organizations in publicly affirming our commitment to sustainability. The Climate Pledge’s hundreds of signatories commit to regularly report on their emissions and reach net-zero emissions by 2040 through decarbonization strategies and carbon offsets. “We’re thrilled to join the world’s leading companies — like MongoDB customers Verizon and Telefónica — in signing The Climate Pledge,” said MongoDB chief product officer, Sahir Azam. “MongoDB looks forward to working with the Climate Pledge team to ensure a more sustainable future for everyone.” Signing the The Climate Pledge is hardly the first step MongoDB has taken toward ensuring a more sustainable future. In 2023, MongoDB committed to being 100% powered by renewable energy by 2026, and achieving net-zero carbon emissions by 2030. To meet those targets, we’re working to reduce our carbon footprint through product innovation, by adding new sources of renewable energy, and by making MongoDB employees’ commutes more sustainable. Goodbye waste, hello (energy) savings In 2023, we also announced MongoDB’s new Sustainable Procurement Policy , which aims to ensure that sustainability is considered at all levels of our supply chain. The policy covers everything from the coffee we purchase (100% sustainably sourced) to the single-use items we use (restrictions leading to a 58% waste reduction in 2023). How MongoDB’s workloads are powered falls under our sustainable procurement efforts. Specifically, we’re currently working with our cloud partners — all of whom share MongoDB’s aim to be 100% powered by renewable energy by 2026 — to reduce our carbon footprint. "MongoDB takes its commitment to carbon reduction seriously, and we're fortunate to work with partners who share our enthusiasm for sustainability,” said MongoDB Lead Performance Engineer Ger Hartnett. “We look forward to continuing to collaborate with our partners on groundbreaking, energy-saving technology that makes real reductions in our carbon intensity." To meet our renewable energy target, we’ve focused our efforts on several areas, such as preferring buildings with renewable energy contracts or on-site solar when considering new office space. We’ve also entered into several virtual purchase power agreements (VPPAs). Virtual purchase power agreements are a great way for companies like MongoDB to invest in renewable energy without building anything on-site and are a proven method of adding renewable energy to the grid. Since 2022, MongoDB has worked with the enterprise sustainability platform Watershed to support renewable energy projects through VPPAs. Our first project helped build a solar plant in Texas that Watershed notes, “will avoid 13,000 tons of CO2, equivalent to taking nearly 3,000 gas-powered cars off the road each year.” And MongoDB recently signed a new VPPA that will support the development of solar panels for a factory in India. Solar energy is currently responsible for about 16% of global renewable energy, and only about 3.4% of overall energy in the U.S. Those numbers are sure to change, however. In the last fifteen years, global solar power generation has grown from 11.36 terawatt-hours to 1289.87 terawatt hours. What’s more, coal accounts for about 70% of India’s power — versus 20% in the United States — so projects like this will help reduce emissions across Asia. And because many MongoDB employees are directly impacted by air pollution in India , we see VPPAs as a way of benefitting the health and well-being of our employees, as well as the planet. MongoDB's stubborn optimism In the early months of the pandemic, Tom Rivett-Carnac, founding partner of Global Optimism — which launched The Climate Pledge with Amazon in 2019 — shared a video about shifting one’s mindset and changing the world . In the face of larger-than-life problems (like climate change), “stubborn optimism,” he said, “animates action, and infuses it with meaning.” “When the optimism leads to a determined action, then they can become self-sustaining … the two together can transform an entire issue and change the world,” he noted. “Stubborn optimism can fill our lives with meaning and purpose.” Composting is an example of a stubbornly optimistic action that’s both easy to adopt and one that (if enough of us do it) can change the world. Food waste accounts for 6% of global greenhouse emissions, and composting can help reduce those emissions. To put food waste emissions in perspective, 6% of global greenhouse emissions is roughly three times higher than annual global aviation emissions. In 2023, we also began tracking MongoDB’s waste and landfill diversion, and we’re working to improve how we dispose of waste by adding composting services to MongoDB’s hub offices. More than 80% of MongoDB’s offices already have composting services, and we aim to hit 100% in 2024. Not only have composting and single-use purchase reduction helped to decrease waste emissions, but both are highly visible to MongoDB employees. MongoDB employees are increasingly excited about sustainability, inspiring the creation of a mini-garden in our New York office, and the use of more sustainable commuting methods like biking. Though I tend to bike more for exercise than commuting these days (I’ve racked up more than 1,000 miles on my bike pass!), more and more MongoDB team members get to work in sustainable ways. For example, we’re rolling out electric vehicle commuting in India, an e-bike program was recently introduced in our Dublin office, and the bike locker in MongoDB’s New York HQ is generally packed. “I love biking to the office,” said Perry Taylor, a New York-based Information Technology Lead at MongoDB. “In addition to being a great way to stay fit, it’s awesome that how I commute helps the environment.” Looking back on 2023, I’m pleased with how much we accomplished toward MongoDB’s sustainability goals. At the same time, I recognize that more needs to be done. MongoDB enters 2024 with a renewed commitment to sustainability, and we look forward to furthering our progress. To learn more about MongoDB’s sustainability progress, please check out our Sustainability webpage and our latest Corporate Sustainability Report . For more information about fellow Climate Pledge signatories and an interactive timeline of progress made, visit The Climate Pledge .
Evolve Your Data Models as You Modernize with Hackolade and Relational Migrator
Application modernization has always been a constant. For many developers and database administrators, the realization that their legacy relational databases that have served their apps well to this point are no longer as easy and fast to work with has become glaringly apparent as they strive to incorporate emerging use cases like generative AI, search, and edge devices into their customer experience at an increasing rate. While many are turning to MongoDB Atlas for the flexible document model and wide range of integrated data services, migrations are often seen as daunting projects. MongoDB Relational Migrator has simplified several of the key tasks required to successfully migrate from today's popular relational databases to MongoDB. With Relational Migrator, teams can design their target MongoDB schema using their existing relational one as a blueprint, migrate their data to MongoDB while transforming it to their newly designed schema, and get a head start on app code modernization through code template generation and query conversion. But as organizations scale their MongoDB footprint through migrations and new app launches, a new challenge emerges: managing and evolving data models with more teams and stakeholders. Sooner or later, modernization becomes as much about change management as it does technology — keeping teams aligned is critical for keeping everyone moving forward. This is where Hackolade comes in. Hackolade Studio is a visual data modeling and schema design application that enables developers to design and document their MongoDB data models, and more importantly, use those entity-relationship diagrams (ERDs) to collaborate with their counterparts in other areas of the business, like database administration, architecture, and product management. MongoDB data model in Hackolade Studio No database is an island, and the teams working with MongoDB cannot afford to work in isolation. With Hackolade Studio, database teams can use these ERDs to translate their point-of-view to others, making hand-offs and handshakes with other teams like operations more seamless, driving developer productivity, and accelerating new feature builds. Jump from Relational Migrator to Hackoldate Studio with ease Hackolade Studio is now making it even easier to transition to their application after using MongoDB Relational Migrator to complete their migrations. Teams can now use Hackolade Studio’s reverse-engineering feature to import their Relational migrator project (.relmig) files, bringing their MongoDB schema directly over into Hackolade Studio. With this integration, teams can start with Relational Migrator to build their initial schema and execute their data migration, then transition to Hackolade Studio to document, manage, and evolve their schema going forward - giving them a greater degree of control, visibility, and collaboration needed to support modernization initiatives that include many migrations across several applications, teams, and legacy relational environments. MongoDB Relational Migrator, showing a relational schema on the left and its transformed MongoDB schema on the right Getting started is incredibly easy. First, you’ll need your Relational Migrator project file, which can be exported from Relational Migrator to your local device. Then in Hackolade Studio, use the reverse-engineering workflow to import your .relmig file into a new or existing data model. For a detailed walkthrough, dive into Hackolade’s documentation for this integration. Importing Relational Migrator files in Hackolade Studio As MongoDB adoption grows within your organization, more apps and more teams will need to interact with your MongoDB data models. With Relational Migrator and Hackolade together, you will have the tools at your disposal to not only kickstart migration projects but also manage MongoDB data models at scale, giving your teams the insights and visibility needed to drive performance and guide app modernization initiatives. Learn more about how Hackolade can maximize developer productivity and support your modernization to MongoDB initiatives. Download MongoDB Relational Migrator for free to get started with migrating your first databases.
Integrate OPC UA With MongoDB - A Feasibility Study With Codelitt
Introducing the Full Stack FastAPI App Generator for Python Developers
We are thrilled to announce the release of the Full Stack FastAPI, React, MongoDB (FARM) base application generator, coinciding with FastAPI's emerging status as a leading modern Python framework. Known for its high performance and ease of use, FastAPI is quickly becoming a top choice for Python developers. This launch is a significant advancement for Python developers eager to build and maintain progressive web applications using the powerful combination of FastAPI and MongoDB. Bridging the Development Gap While it's always been easy and quick to start building modern web applications with MongoDB and FastAPI, in the past developers still had to make many decisions about other parts of the stack, such as authentication, testing, integration etc., and manually integrate these components in their application. Our new app generator aims to simplify some of this process further. It enables you to quickly spin up a production-grade, full-stack application, seamlessly integrating FastAPI and MongoDB, thereby significantly enhancing the developer experience. Simplifying the Development Journey Now, with the launch of our Full Stack FastAPI App Generator, MongoDB dramatically simplifies the initial stages of project setup for production-grade applications by providing a well-structured app skeleton and reduces the initial learning curve and the time spent setting up the project. For new learners and seasoned developers alike, this means less time figuring out the basics and more time building differentiated experiences for their users. Key Features Included in the App Generator: Complete Web Application Stack: Generates a foundation for your project development, integrating both front-end and back-end components. Docker Compose Integration: Optimized for local development. Built-in Authentication System: Includes user management schemas, models, CRUD, and APIs, with OAuth2 JWT token support and magic link authentication. FastAPI Backend Features: MongoDB Motor for database operations . MongoDB ODMantic for ODM creation . Common CRUD support via generic inheritance. Standards-based architecture, fully compatible with OpenAPI and JSON Schema . Next.js/React Frontend: Middleware authorization for page access control. Form validation using React Hook Form . State management with Redux . CSS and templates with TailwindCSS , HeroIcons , and HeadlessUI . Operational and Monitoring Tools: Includes Celery for task management, Flower for job monitoring, Traefik for seamless load balancing and HTTPS certificate automation, and Github Actions for comprehensive front-end and back-end testing. Start now Accelerate your web application development with MongoDB and FastAPI today. Visit our Github repository for the app generator and start transforming your web development experience. Please note: This tool is an experimental project and not yet officially supported by MongoDB.
Panel: How MongoDB is Helping Drive Innovation for Indian Organisations
At MongoDB.local Delhi a panel of CXOs and IT leaders discussed the strategies and challenges of using software to drive innovation in their organisations. Here are the key lessons they shared. Fostering innovation: tips and challenges Our panel, which included representatives from Appy Pie, Autodit, Formidium, and Tech Mahindra, agreed that the rapid development of data analytics technology, and the scarcity of trained talent on the ground, were key challenges when it comes to driving innovation. To stay on top, Tech Mahindra has a dedicated talent acquisition engine to keep tabs on those technologies, and customer requirements. “We imbibe these learnings, so we’re equipped to deliver solutions on the ground,” explained Shuchi Agrawal, Global Head for Data Analytics, Pre-Sales, and Solutioning at Tech Mahindra (IT services and consulting). “When I think of data and MongoDB, I think MongoDB BIRT (business intelligence reporting tools),” says Shuchi. To accelerate their customers’ journey of transformation, for Tech Mahindra, automation must be based on innovations on the baseline of analytics workloads, i.e. data. “That’s why MongoDB is one of the key elements in most of our solution designs when we’re looking for some of the advanced analytics workloads,” says Shuchi. Choosing technology and evaluating products For Vaibhav Agrawal, Executive Vice President of Formidium, selecting technology to drive innovation comes with key caveats: it must be easy to implement, the talent must exist in the market to do the implementation, zero-trust is essential — as data security is paramount for customers — and it must perform in terms of scalability, efficiency, optimization, and monitoring. “If those things are there, you never have to go to other technology,” says Vaibhav. “And MongoDB Atlas comes in on all those check marks perfectly — that's why we chose it.” Enhancing innovation strategies with database features Vaibhav observed two aspects to any innovation: a) having the idea and creating something, and b) re-innovating it. So, for innovation to perpetuate, ideas must be adapted according to your experience, market changes, and changes in technology — and that means reviewing your products’ performance. “MongoDB Atlas has that amazing 360-degree view of your database activities, and the monitoring of your resources,” says Vaibhav. “Plus, it's very easy to get analytics out of it and change the course of your innovation.” “You always need to keep watch on the performance of a database," he adds. "Then you will be able to keep pace with innovation for years.” To see more announcements and get the latest product updates, visit our What's New page.
Leveraging MongoDB Atlas in your Internal Developer Platform (IDP)
DevOps, a portmanteau of “Developer” and “Operations”, rose to prominence around the early 2010s and established a culture of incorporating automated processes and tools designed to deliver applications and services to users faster than the traditional software development process. A significant part of that was the movement to "shift left" by empowering developers to self-serve their infrastructure needs, in theory offering them more control over the application development lifecycle in a way that reduced the dependency on central operational teams. While these shifts towards greater developer autonomy were occurring, the proliferation of public clouds, specific technologies (like GitHub, Docker, Kubernetes, Terraform), and microservices architectures entered the market and became standard practice in the industry. As beneficial as these infrastructure advancements were, these technical shifts added complexity to the setups that developers were using as a part of their application development processes. As a result, developers needed to have a more in-depth, end-to-end understanding of their toolchain, and more dauntingly, take ownership of a growing breadth of infrastructure considerations. This meant that the "shift left" drastically increased the cognitive load on developers, leading to inefficiencies because self-managing infrastructure is time-consuming and difficult without a high level of expertise. In turn, this increased the time to market and hindered innovation. Concurrently, the increasing levels of permissions that developers needed within the organization led to a swath of compliance issues, such as inconsistent security controls, improper auditing, unhygienic data and data practices increased overhead which ate away at department budgets, and incorrect reporting. Unsurprisingly, the desire to enable developers to self-serve to build and ship applications hadn't diminished, but it became clear that empowering them without adding friction or a high level of required expertise needed to become a priority. With this goal in mind, it became clear that investment was required to quickly and efficiently abstract away the complexities of the operational side of things for developers. From this investment comes the rise of Platform Engineering and Internal Developer Platforms (whether companies are labeling it as such or not). Platform engineering and the rise of internal developer platforms Within a developer organization, platform engineering (or even a central platform team) is tasked with creating golden paths for developers to build and ship applications at scale while keeping infrastructure spend and cognitive load on developers low. At the core of the platform engineering ethos is the goal of optimizing the developer experience to accelerate the delivery of applications to customers. Like teaching someone to fish, platform teams help pave the way for greater developer efficiency by providing them with pipelines that they can take and run with, reducing time to build, and paving the way for greater developer autonomy without burdening developers with complexity. To do this, platform teams strive to design toolchains and workflows based on the end goals of the developers in their organization. Therefore, it’s critical for the folks tasked with platform engineering to understand the needs of their developers, and then build a platform that is useful to the target audience. The end result is what is often (but not exclusively) known as an Internal Developer Platform. What is an IDP? An IDP is a collection of tools and services, sourced and stitched together by central teams to create golden paths for developers who will then use the IDP to simplify and streamline application building. IDPs reduce complexity and lower cognitive load on developers - often by dramatically simplifying the experience of configuring infrastructure and services that are not a direct part of the developer's application. They encourage developers to move away from spending excess time managing the tools they use and allow them to focus on delivering applications at speed and scale. IDPs enable developers the freedom to quickly and easily build, deploy, and manage applications while reducing risk and overhead costs for the organization by centralizing oversight and iteration of development practices. An IDP is tailored with developers in mind and will often consist of the following tools: Infrastructure platform that enabled running a wide variety of workloads with the highest degree of security, resilience, and scalability, and a high degree of automation (eg. Kubernetes) Source code repository system that allows teams to establish a single source of truth for configurations, ensuring version control, data governance, and compliance. (eg. Github, Gitlab, BitBucket) Control interface that enables everyone working on the application to interact with and manage its resources. (eg. Port or Backstage) Continuous integration and continuous deployment (CI/CD) pipeline that applies code and infrastructure configuration to an infrastructure platform. (eg. ArgoCD, Flux, CircleCI, Terraform, CloudFormation) Data layer that can handle changes to schemas and data structures. (eg. MongoDB Atlas) Security layer to manage permissions in order to keep compliance. Examples of this are roles-based compliance tools or secrets management tools (eg. Vault). While some tools have overlap and not all of them will be a part of a specific IDP, the goal of platform engineering efforts is to build an IDP for their developers that is tightly integrated with infrastructure resources and services to maximize automation, standardization, self-service, and scale for developers, as well as maximizing security whilst minimizing overhead for the enterprise. While there will be many different terms that different organizations and teams use to refer to their IDP story, at its core, an IDP is a tailored set of tech, tools, and processes , built and managed by a central team, and used to provide developers with golden paths that enable greater developer self-service, lower cognitive load, and reduce risk. How does MongoDB Atlas fit into this story? Developers often cite working with data as one of the most difficult aspects of building applications. Rigid and unintuitive data technologies impede building applications and can lead to project failure if they don’t deliver the data model flexibility and query functionality that your applications demand. A data layer that isn’t integrated into your workflows slows deployments, and manual operations are a never-ending drag on productivity. Failures and downtime lead to on-call emergencies – not to mention the enormous potential risk of a data breach. Therefore, making it easy to work with data is critical to improving the developer experience. IDPs are in part about giving developers the autonomy to build applications. For this reason, MongoDB’s developer data platform is a natural fit for an IDP because it serves as a developer data platform that can easily fit into any team’s existing toolstack and abstracts away the complexities associated with self-managing a data layer. MongoDB’s developer data platform is a step beyond a traditional database in that it helps organizations drive innovation at scale by providing a unified way to work with data that address transactional workloads, app-driven analytics, full-text search, vector search, stream data processing, and more, prioritizing an intuitive developer experience and automating security, resilience, and performance at scale. This simplification and broad coverage of different use cases make a monumental difference to the developer experience. By incorporating MongoDB Atlas within an IDP, developer teams have a fully managed developer data platform at their disposal that enables them to build and underpin best-in-class applications. This way teams won’t have to worry about adding the overhead and manual work involved in self-hosting a database and then building all these other supporting functionality that come out of the box with MongoDB Atlas. Lastly, MongoDB Atlas can be hosted on more cloud regions than any other cloud database in the market today with support for AWS, Azure, and Google Cloud. How can I incorporate MongoDB Atlas into my IDP? MongoDB Atlas’ Developer Data Platform offers many ways to integrate Atlas into their IDP through many tools that leverage the MongoDB Atlas Admin API. The Atlas Admin API can be used independently or via one of these tools/integrations and provides a programmatic interface to directly manage and automate various aspects of MongoDB Atlas, without needing to switch between UIs or incorporate manual scripts. These tools include: Atlas Kubernetes Operator HashiCorp Terraform Atlas Provider AWS CloudFormation Atlas Resources Atlas CDKs Atlas CLI Atlas Go SDK Atlas Admin API With the Atlas Kubernetes Operator, platform teams are able to seamlessly integrate MongoDB Atlas into the current Kubernetes deployment pipeline within their IDP allowing their developers to manage Atlas in the same way they manage their applications running in Kubernetes. First, configurations are stored and managed in a git repository and applied to Kubernetes via CD tools like ArgoCD or Flux. Then, Atlas Operator's custom resources are applied to Atlas using the Atlas Admin API and support all the building blocks you need, including projects, clusters, database users, IP access lists, private endpoints, backup, and more. For teams that want to take the IaC route in connecting Atlas to their IDP, Atlas offers integrations with HashiCorp Terraform and AWS CloudFormation which can also be used to programmatically spin up Atlas services off the IaC integrations built off the Atlas Admin API in the Cloud environment of their choice.. Through provisioning with Terraform, teams can deploy, update, and manage Atlas configurations as code with either the Terraform Provider or the CDKTF. MongoDB also makes it easier for Atlas customers who prefer using AWS CloudFormation to easily manage, provision, and deploy MongoDB Atlas services in three ways: through resources from the CloudFormation Public Registry, AWS Quick Starts, and the AWS CDK. Other programmatic ways that Atlas can be incorporated into an IDP are through Atlas CLI, which interacts with Atlas from a terminal with short and intuitive commands and accomplishes complex operational tasks such as creating a cluster or setting up an access list interactively Atlas Go SDK which provides platform-specific and Go language-specific tools, libraries, and documentation to help build applications quickly and easily Atlas Admin API provides a RESTful API, accessed over HTTPS, to interact directly with MongoDB Atlas control plane resources. Get started with MongoDB Atlas today The fastest way to get started is to create a MongoDB Atlas account from the AWS Marketplace , Azure Marketplace , or Google Cloud Marketplace . Go build with MongoDB Atlas today!
MongoDB Design Reviews Help Customers Achieve Transformative Results
The pressure to deliver flawless software can weigh heavily on developers' minds and cause teams to second-guess their processes. While no amount of preparation can guarantee success, we've found that a design review conducted by members of the MongoDB Developer Relations team can go a long way in ensuring best practices have been followed and that optimizations are in place to help the team deliver confidently. Design reviews are hour-long sessions where we partner with our customers to help them fine-tune their data models for specific projects or use cases. They serve to give our customers a jump start in the early stages of application design when the development team is new to MongoDB and trying to understand how best to model their data to achieve their goals. A design review is a valuable enablement session that leverages the development team’s own workload as a case study to illustrate performant and efficient MongoDB design. We also help customers explore the art of the possible and put them on the right path toward achieving their desired outcomes. When participants leave these sessions, they carry the knowledge and confidence to evolve their designs independently. The underlying principle that characterizes these reviews is the domain-driven design ethos, an indispensable concept in software engineering. Design isn't merely a box to tick; it's a daily routine for developers. Design reviews are more than just academic exercises; they hold tangible goals. A primary aim is to enable and educate developers on a global scale, transitioning them away from legacy systems like Oracle. It's about supporting developers, helping them overcome obstacles, and imparting critical education and training. Mastery of the tools is essential, and our sessions delve deep into addressing access patterns and optimizing schema for performance. At its core, a design review is a catalyst for transformation. It's a collaborative endeavor, merging expertise and fostering an environment where innovation thrives. It's not just about reviewing. When our guidance and expertise are combined with developer innovation and talent, the journey from envisioning to implementing a robust data model becomes a shared success. During the session, our experts look at the workload's data-related functional requirements — like data entities and, in particular, reads and writes — along with non-functional requirements like growth rates, performance, and scalability. With these insights in hand, we can recommend target document schemas that help developers achieve the goals they established before committing their first lines of code. A properly designed document schema is fundamental for performant and cost-efficient operations. Getting schema wrong is often the number one reason why projects fail. Design reviews help customers avoid lost time and effort due to poor schemas. Design reviews in practice Not long ago, we were approached by a customer in financial services who wanted us to conduct a design review for an application they were building in MongoDB Atlas . The application was designed to give regional account managers a comprehensive view of aggregated performance data. Specifically, it aimed to provide insights into individual stock performance within a customer's portfolio across a specified time frame within a designated region. When we talked to them, the customer highlighted an issue with their aggregation pipeline , which was taking longer than expected, ranging from 20 to 40 seconds to complete. Their SLA demanded a response time of under two seconds. Most design reviews involve a couple of steps to assess and diagnose the problem. The first involves assessing the workload. During this step, a few of the things we look at include: Number of collections The documents in collections How many records documents contain How frequently data is being written or updated in the collections What hours of the day see the most activity How much storage is being consumed Whether and how old data is being purged from collections The cluster size the customer is running in MongoDB Once we performed this assessment for our finserv customer, we had a better understanding of the nature and scale of the workload. The next step was examining the structure of the aggregation pipeline. What we found was that the way data was being collected had a few unnecessary steps, such as breaking down the data and then reassembling it through various $unwind and $group stages. The MongoDB DevRel experts suggested using arrays to reduce the number of steps involved to just two: first, finding the right data, and then, looking up the necessary information. Eliminating the $group stage reduced the response time to 19 seconds — a significant improvement but still short of the target. In the next step of the design review, the MongoDB DevRel team looked to determine which schema design patterns could be applied to optimize the pipeline performance. In this particular case, there was a high volume of stock activity documents being written to the database every minute, but users were querying only a limited number of times per day. With this in mind, our DevRel team decided to apply the computed design pattern . The computed pattern is ideal when you have data that needs to be computed repeatedly in an application. By pre-calculating and saving commonly requested data, it avoids having to do the same calculation each time the data is requested. With our finserv customer, we were able to pre-calculate the trading volume and the starting, closing, high, and low prices for each stock. These values were then stored in a new collection that the $lookup pipeline could access. This resulted in a response time of 1800 ms — below our two-second target SLA, but our DevRel team wasn't finished. They performed additional optimizations, including using the extended reference pattern to embed region data in the pre-computed stock activity so that all the related data can be retrieved with a single query and avoiding the use of a $lookup-based join. After the team was finished with their optimizations, the final test execution of the pipeline resulted in a response time of 377 ms — a 60x improvement in the performance of their aggregation pipeline and more than four times faster than the application target response time. Read the complete story , including a step-by-step breakdown with code examples of how we helped one of our financial services customers achieve a 60x performance improvement. If you'd like to learn more about MongoDB data modeling and aggregation pipelines, we recommend the following resources: Daniel Coupal and Ken Alger’s excellent series of blog posts on MongoDB schema patterns Daniel Coupal and Lauren Schaefer’s equally excellent series of blog posts on MongoDB anti-patterns Paul Done’s ebook, Practical MongoDB Aggregations MongoDB University Course, " M320 - MongoDB Data Modeling " If you're interested in a Design Review, please contact your account representative .
Data Governance for Building Generative AI Applications with MongoDB
Generative AI (GenAI) has been evolving at a rapid pace. With the introduction of OpenAI’s ChatGPT powered by GPT-3.5 reaching 100 million monthly active users in just two months, other major large language models (LLMs) have followed in ChatGPT's footsteps. Cohere’s LLM supports more than 100 languages and is now available on their AI platform, Google’s Med-PaLM was designed to provide high-quality answers to medical questions, OpenAI introduced GPT-4 (a 40% improvement over GPT-3.5), Microsoft integrated GPT-4 within its Office 365 suite, and Amazon introduced Bedrock , a fully managed service that makes foundation models available via API. These are just a few advancements in the Generative AI market, and a lot of enterprises and startups are adopting AI tools to solve their specific use cases. The developer community and open-source models are also growing as companies adapt to the new technology paradigm shift in the market. Building intelligent GenAI applications requires flexibility with data. One of the core requirements is data governance , which will be discussed in this blog. Data governance is a broad term encompassing everything you do to ensure data is secure, private, accurate, available, and usable. It includes the processes, policies, measures, technology, tools, and controls around the data lifecycle. When organizations build applications and transition to a production environment, they often deal with personal data (PII) or commercially sensitive data, such as data related to intellectual property, and want to make sure all the controls are in place. When organizations are looking to build GenAI-powered apps, there are a few capabilities that are required to deliver intelligent and modern app experiences: Handle data for both operational and analytical workloads A data platform that is highly scalable and performant An expressive query API that can work with any kind of data type Tight integrations with established and open-source LLMs Native vector search capabilities like embeddings that enable semantic search and retrieval-augmented generation (RAG) To learn more about the MongoDB developer data platform and how to embed generative AI applications with MongoDB, you can refer to this paper . This blog goes into detail on the security controls of MongoDB Atlas that modern AI applications need. Check out our AI resource page to learn more about building AI-powered apps with MongoDB. What are some of the potential security risks while building GenAI applications? As per the recent State of AI, 2023 report by Retool, data security and data accuracy are the top two pain points when developing AI applications. In the survey, a third of respondents cited data security as a primary pain point, and it increases almost linearly with company size (refer to the MongoDB blog for more details.) Top pain points around developing AI apps. Source: State of AI 2023 report by Retool While organizations leverage AI technology to improve their businesses, they should be wary of the potential risks. The unintended consequences of generative AI are more likely to expose the above risks as companies approach experimenting with various models and AI tools. Although organizations follow best practices to be deliberate and structured in developing production-ready generative AI applications, they need to have strict security controls in place to alleviate the key security considerations that AI applications pose. Here are some considerations for securing AI applications/systems Data security and privacy: Generative AI foundation models rely on large amounts of data to both train against and generate new content. If the training data or data available for the RAG process (retrieval augmented generation) includes personal or confidential data, that data may turn up in outputs in unpredictable ways. Hence it is very important to have strong governance and controls in place so that confidential data does not wind up in outputs. Intellectual property infringement: Organizations need to avoid the unauthorized use, duplication, or sale of works legally regarded as protected intellectual property. They also have to make sure to train the AI models so the output does not resemble existing works and hence infringe the copyrights of the original. Since this is still a new area for AI systems, the laws are evolving. Regulatory compliance: AI applications have to comply with industry standards and policies like HIPAA in healthcare, PCI in finance, GDPR for data protection for EU citizens, CCPA, and more. Explainability: AI systems and algorithms are sometimes perceived as opaque, making non-deterministic decisions. Explainability is the concept that a machine learning model and its output can be explained in a way that makes sense to a human being at an acceptable level and provides repeatable outputs given the same inputs. This is crucial for building trust and accountability in AI applications, especially in domains like healthcare, finance, and security. AI Hallucinations: AI models may generate inaccurate information, also known as hallucinations. These are often caused by limitations in training data and algorithms. Hallucinations can result in regulatory violations in industries like finance, healthcare, and insurance, and, in the case of individuals, could be reputationally damaging or even defamatory. These are just some of the considerations when using AI tools and systems. There are additional concerns when it comes to physical security, organizational measures, technical controls for the workforce — both internal and partners — and monitoring and auditing of the systems. By addressing each of these critical issues, organizations can ensure the AI applications they roll out to production are compliant and secure. Let us look at how MongoDB’s developer data platform can help with some of these considerations around security controls and measures. How does MongoDB address the security risks and data governance around GenAI? MongoDB's developer data platform, built on MongoDB Atlas , unifies operational, analytical, and generative AI data services to streamline building intelligent applications. At the core of MongoDB Atlas is its flexible document data model and developer-native query API. Together, they enable developers to dramatically accelerate the speed of innovation, outpace competitors, and capitalize on new market opportunities presented by GenAI. Developers and data science teams around the world are innovating with AI-powered applications on top of MongoDB. They span multiple use cases in various industry sectors and rely on the security controls MongoDB Atlas provides. Here is the library of sample case studies, white papers, and other resources about how MongoDB is helping customers build AI-powered applications. MongoDB security & compliance capabilities MongoDB Atlas offers built-in security controls for all organizational data. The data can be application data as well as vector embeddings and their associated metadata — giving holistic protection of all the data you are using for GenAI-powered applications. Atlas enables enterprise-grade features to integrate with your existing security protocols and compliance standards. In addition, Atlas simplifies deploying and managing your databases while offering the versatility for developers to build resilient applications. MongoDB allows easy integration for security administrators with external systems, while developers can focus on their business requirements. Along with key security features being enabled by default, MongoDB Atlas is designed with security controls that meet enterprise security requirements. Here's how these controls help organizations build their AI applications on MongoDB’s platform and meet the considerations we discussed above: Data security MongoDB has access and authentication controls enabled by default. Customers can authenticate to the platform using mechanisms including SCRAM, x.509 certificates, LDAP, passwordless authentication with AWS-IAM, and OpenID Connect. MongoDB also provides role-based access control (RBAC) to determine the user's access privilege to various resources within the platform. Data scientists and developers building AI applications can leverage any of these access controls to fine-tune user access and privileges while training or prompting their AI models. Organizations can implement access control mechanisms to restrict access to the data to only authorized personnel. End-to-end encryption of data: MongoDB’s data encryption tools offer robust features to protect your data while in transit (network), at rest (storage), and in use (memory and logs). Customers can use automatic encryption of key data fields like personally identifiable information (PII), protected health information (PHI), or any data deemed sensitive, ensuring data is encrypted throughout its lifecycle. Going beyond encryption at rest and in transit, MongoDB has released Queryable Encryption to encrypt data in use. Queryable Encryption enables an application to encrypt sensitive data from the client side, store the encrypted data in the MongoDB database, and run server-side queries on the encrypted data without having to decrypt it. Queryable Encryption is an excellent anonymization technique that makes sensitive data opaque. This technology can be leveraged when you are using company-specific data that contain confidential information from the MongoDB database for the RAG process and that data needs to be anonymized or when you are storing sensitive data in the database. Regulatory compliance and data privacy Many uses of generative AI are subject to existing laws and regulations that govern data privacy, intellectual property, and other related areas. New laws and regulations aimed specifically at AI are in the works around the world. The MongoDB developer data platform undergoes independent verification of platform security, privacy, and compliance controls to help customers meet their regulatory and policy objectives, including the unique compliance needs of highly regulated industries and U.S. government agencies. Refer to the MongoDB Atlas Trust Center for our current certifications and assessments. Regular security audits Organizations should conduct regular security audits to identify potential vulnerabilities in their data security practices. This can help ensure that any security weaknesses are identified and addressed promptly. Audits help to identify and mitigate any risks and errors in your AI models and data, as well as ensure that you are compliant with regulations and standards. MongoDB offers granular auditing that provides a trail of how and what data was used and is designed to monitor and detect any unauthorized access to data. What are additional best practices and considerations while working with AI models? While it is essential to work with a trusted data platform, it is also important to prioritize security and data governance as discussed. In addition to data security , compliance , and data privacy as mentioned above, here are additional best practices and considerations. Data quality Monitor and assess the quality of input data to avoid biases in foundation models. Make sure that your training data is representative of the domain in which your model will be applied. If your model is expected to generalize to real-world scenarios, your training data or data made available for the RAG process should be monitored. Secure deployment Use secure and encrypted channels for deploying foundation models. Implement robust authentication and authorization mechanisms to ensure that only authorized users and systems can access sensitive data and AI models. Enforce mechanisms to anonymize sensitive information to protect user privacy. Audit trails and monitoring Maintain detailed audit trails and logs of model training, evaluation, and deployment activities. Implement continuous monitoring of both data inputs and model outputs for unexpected patterns or deviations. MongoDB maintains audit trails and logs of all the data operations and data processing. Customers can use the audit logs for monitoring, troubleshooting, and security purposes, including intrusion detection. We utilize a combination of automated scanning, automated alerting, and human review to monitor the data. Secure data storage Implement secure storage practices for both raw and processed data. Use encryption for data at rest and in transit as discussed above. Encryption at-rest is turned on automatically on MongoDB servers. The encryption occurs transparently in the storage layer; i.e. all data files are fully encrypted from a filesystem perspective, and data only exists in an unencrypted state in memory and during transmission. Conclusion As generative AI tools grow in popularity, it matters more than ever how an organization understands and protects its data, and puts it to use — defining the roles, controls, processes, and policies for interacting with data. As modern enterprises use generative AI and LLMs to better serve customers and extract insights from the data, strong data governance becomes essential. By understanding the potential risks and carefully evaluating the platform capabilities the data is hosted on, organizations can confidently harness the power of these tools. For more details on MongoDB’s trusted platform, refer to these links. MongoDB Security Hub Platform Trust Center Atlas Technical and Organization Security Measures MongoDB Compliance & Assessments MongoDB Data Privacy