Pramod Borkar

5 results

Data Governance for Building Generative AI Applications with MongoDB

Generative AI (GenAI) has been evolving at a rapid pace. With the introduction of OpenAI’s ChatGPT powered by GPT-3.5 reaching 100 million monthly active users in just two months, other major large language models (LLMs) have followed in ChatGPT's footsteps. Cohere’s LLM supports more than 100 languages and is now available on their AI platform, Google’s Med-PaLM was designed to provide high-quality answers to medical questions, OpenAI introduced GPT-4 (a 40% improvement over GPT-3.5), Microsoft integrated GPT-4 within its Office 365 suite, and Amazon introduced Bedrock , a fully managed service that makes foundation models available via API. These are just a few advancements in the Generative AI market, and a lot of enterprises and startups are adopting AI tools to solve their specific use cases. The developer community and open-source models are also growing as companies adapt to the new technology paradigm shift in the market. Building intelligent GenAI applications requires flexibility with data. One of the core requirements is data governance , which will be discussed in this blog. Data governance is a broad term encompassing everything you do to ensure data is secure, private, accurate, available, and usable. It includes the processes, policies, measures, technology, tools, and controls around the data lifecycle. When organizations build applications and transition to a production environment, they often deal with personal data (PII) or commercially sensitive data, such as data related to intellectual property, and want to make sure all the controls are in place. When organizations are looking to build GenAI-powered apps, there are a few capabilities that are required to deliver intelligent and modern app experiences: Handle data for both operational and analytical workloads A data platform that is highly scalable and performant An expressive query API that can work with any kind of data type Tight integrations with established and open-source LLMs Native vector search capabilities like embeddings that enable semantic search and retrieval-augmented generation (RAG) To learn more about the MongoDB developer data platform and how to embed generative AI applications with MongoDB, you can refer to this paper . This blog goes into detail on the security controls of MongoDB Atlas that modern AI applications need. Check out our AI resource page to learn more about building AI-powered apps with MongoDB. What are some of the potential security risks while building GenAI applications? As per the recent State of AI, 2023 report by Retool, data security and data accuracy are the top two pain points when developing AI applications. In the survey, a third of respondents cited data security as a primary pain point, and it increases almost linearly with company size (refer to the MongoDB blog for more details.) Top pain points around developing AI apps.  Source: State of AI 2023 report by Retool While organizations leverage AI technology to improve their businesses, they should be wary of the potential risks. The unintended consequences of generative AI are more likely to expose the above risks as companies approach experimenting with various models and AI tools. Although organizations follow best practices to be deliberate and structured in developing production-ready generative AI applications, they need to have strict security controls in place to alleviate the key security considerations that AI applications pose. Here are some considerations for securing AI applications/systems Data security and privacy: Generative AI foundation models rely on large amounts of data to both train against and generate new content. If the training data or data available for the RAG process (retrieval augmented generation) includes personal or confidential data, that data may turn up in outputs in unpredictable ways. Hence it is very important to have strong governance and controls in place so that confidential data does not wind up in outputs. Intellectual property infringement: Organizations need to avoid the unauthorized use, duplication, or sale of works legally regarded as protected intellectual property. They also have to make sure to train the AI models so the output does not resemble existing works and hence infringe the copyrights of the original. Since this is still a new area for AI systems, the laws are evolving. Regulatory compliance: AI applications have to comply with industry standards and policies like HIPAA in healthcare, PCI in finance, GDPR for data protection for EU citizens, CCPA, and more. Explainability: AI systems and algorithms are sometimes perceived as opaque, making non-deterministic decisions. Explainability is the concept that a machine learning model and its output can be explained in a way that makes sense to a human being at an acceptable level and provides repeatable outputs given the same inputs. This is crucial for building trust and accountability in AI applications, especially in domains like healthcare, finance, and security. AI Hallucinations: AI models may generate inaccurate information, also known as hallucinations. These are often caused by limitations in training data and algorithms. Hallucinations can result in regulatory violations in industries like finance, healthcare, and insurance, and, in the case of individuals, could be reputationally damaging or even defamatory. These are just some of the considerations when using AI tools and systems. There are additional concerns when it comes to physical security, organizational measures, technical controls for the workforce — both internal and partners — and monitoring and auditing of the systems. By addressing each of these critical issues, organizations can ensure the AI applications they roll out to production are compliant and secure. Let us look at how MongoDB’s developer data platform can help with some of these considerations around security controls and measures. How does MongoDB address the security risks and data governance around GenAI? MongoDB's developer data platform, built on MongoDB Atlas , unifies operational, analytical, and generative AI data services to streamline building intelligent applications. At the core of MongoDB Atlas is its flexible document data model and developer-native query API. Together, they enable developers to dramatically accelerate the speed of innovation, outpace competitors, and capitalize on new market opportunities presented by GenAI. Developers and data science teams around the world are innovating with AI-powered applications on top of MongoDB. They span multiple use cases in various industry sectors and rely on the security controls MongoDB Atlas provides. Here is the library of sample case studies, white papers, and other resources about how MongoDB is helping customers build AI-powered applications. MongoDB security & compliance capabilities MongoDB Atlas offers built-in security controls for all organizational data. The data can be application data as well as vector embeddings and their associated metadata — giving holistic protection of all the data you are using for GenAI-powered applications. Atlas enables enterprise-grade features to integrate with your existing security protocols and compliance standards. In addition, Atlas simplifies deploying and managing your databases while offering the versatility for developers to build resilient applications. MongoDB allows easy integration for security administrators with external systems, while developers can focus on their business requirements. Along with key security features being enabled by default, MongoDB Atlas is designed with security controls that meet enterprise security requirements. Here's how these controls help organizations build their AI applications on MongoDB’s platform and meet the considerations we discussed above: Data security MongoDB has access and authentication controls enabled by default. Customers can authenticate to the platform using mechanisms including SCRAM, x.509 certificates, LDAP, passwordless authentication with AWS-IAM, and OpenID Connect. MongoDB also provides role-based access control (RBAC) to determine the user's access privilege to various resources within the platform. Data scientists and developers building AI applications can leverage any of these access controls to fine-tune user access and privileges while training or prompting their AI models. Organizations can implement access control mechanisms to restrict access to the data to only authorized personnel. End-to-end encryption of data: MongoDB’s data encryption tools offer robust features to protect your data while in transit (network), at rest (storage), and in use (memory and logs). Customers can use automatic encryption of key data fields like personally identifiable information (PII), protected health information (PHI), or any data deemed sensitive, ensuring data is encrypted throughout its lifecycle. Going beyond encryption at rest and in transit, MongoDB has released Queryable Encryption to encrypt data in use. Queryable Encryption enables an application to encrypt sensitive data from the client side, store the encrypted data in the MongoDB database, and run server-side queries on the encrypted data without having to decrypt it. Queryable Encryption is an excellent anonymization technique that makes sensitive data opaque. This technology can be leveraged when you are using company-specific data that contain confidential information from the MongoDB database for the RAG process and that data needs to be anonymized or when you are storing sensitive data in the database. Regulatory compliance and data privacy Many uses of generative AI are subject to existing laws and regulations that govern data privacy, intellectual property, and other related areas. New laws and regulations aimed specifically at AI are in the works around the world. The MongoDB developer data platform undergoes independent verification of platform security, privacy, and compliance controls to help customers meet their regulatory and policy objectives, including the unique compliance needs of highly regulated industries and U.S. government agencies. Refer to the MongoDB Atlas Trust Center for our current certifications and assessments. Regular security audits Organizations should conduct regular security audits to identify potential vulnerabilities in their data security practices. This can help ensure that any security weaknesses are identified and addressed promptly. Audits help to identify and mitigate any risks and errors in your AI models and data, as well as ensure that you are compliant with regulations and standards. MongoDB offers granular auditing that provides a trail of how and what data was used and is designed to monitor and detect any unauthorized access to data. What are additional best practices and considerations while working with AI models? While it is essential to work with a trusted data platform, it is also important to prioritize security and data governance as discussed. In addition to data security , compliance , and data privacy as mentioned above, here are additional best practices and considerations. Data quality Monitor and assess the quality of input data to avoid biases in foundation models. Make sure that your training data is representative of the domain in which your model will be applied. If your model is expected to generalize to real-world scenarios, your training data or data made available for the RAG process should be monitored. Secure deployment Use secure and encrypted channels for deploying foundation models. Implement robust authentication and authorization mechanisms to ensure that only authorized users and systems can access sensitive data and AI models. Enforce mechanisms to anonymize sensitive information to protect user privacy. Audit trails and monitoring Maintain detailed audit trails and logs of model training, evaluation, and deployment activities. Implement continuous monitoring of both data inputs and model outputs for unexpected patterns or deviations. MongoDB maintains audit trails and logs of all the data operations and data processing. Customers can use the audit logs for monitoring, troubleshooting, and security purposes, including intrusion detection. We utilize a combination of automated scanning, automated alerting, and human review to monitor the data. Secure data storage Implement secure storage practices for both raw and processed data. Use encryption for data at rest and in transit as discussed above. Encryption at-rest is turned on automatically on MongoDB servers. The encryption occurs transparently in the storage layer; i.e. all data files are fully encrypted from a filesystem perspective, and data only exists in an unencrypted state in memory and during transmission. Conclusion As generative AI tools grow in popularity, it matters more than ever how an organization understands and protects its data, and puts it to use — defining the roles, controls, processes, and policies for interacting with data. As modern enterprises use generative AI and LLMs to better serve customers and extract insights from the data, strong data governance becomes essential. By understanding the potential risks and carefully evaluating the platform capabilities the data is hosted on, organizations can confidently harness the power of these tools. For more details on MongoDB’s trusted platform, refer to these links. MongoDB Security Hub Platform Trust Center Atlas Technical and Organization Security Measures MongoDB Compliance & Assessments MongoDB Data Privacy

December 14, 2023

Why Queryable Encryption Matters to Developers and IT Decision Makers

Enterprises face new challenges in protecting data as modern applications constantly change requirements. There are new technologies, advances in cryptography, regulatory constraints, and architectural complexities. The threat landscape and attack techniques are also changing, making it harder for developers to be experts in data protection. Client-side field level encryption , sometimes referred to as end-to-end encryption, provides another layer of security that enables enterprises to protect sensitive data. Although client-side encryption fulfills many modern requirements, architects, and developers face challenges in implementing these solutions to protect their data efficiently for several reasons: Multiple cryptographic tools to choose from — Identifying the relevant libraries, selecting the appropriate encryption algorithms, configuring the selected algorithms, and correctly setting up the API for interaction are some of the challenges around tools. Encryption key management challenges — how and where to store the encryption keys, how to manage access, and how to manage key lifecycle such as rotation and revocation. Customize application(s) — Developers might have to write custom code to encrypt, decrypt, and query the data requiring widespread application changes. With Queryable Encryption now generally available, MongoDB helps customers protect data throughout its data lifecycle — data is encrypted at the client side and remains encrypted in transit, at rest, and in use while in memory, in logs, and backups. Also, MongoDB is the only database provider that allows customers to run rich queries on encrypted data, just like they can on unencrypted data. This is a huge advantage for customers as they can query and secure the data confidently. Why does Queryable Encryption matter to IT decision-makers and developers? Here are a few reasons: Security teams within enterprises deal with protecting their customers’ sensitive data — financial records, personal data, medical records, and transaction data. Queryable Encryption provides a high level of security — by encrypting sensitive fields from the client side, the data remains encrypted while in transit, at rest, and in use and is only ever decrypted back at the client. With Queryable Encryption, customers can run expressive queries on encrypted data using an industry-first fast, encrypted search algorithm. This allows the server to process and retrieve matching documents without the server understanding the data or why the document should be returned. Queryable Encryption was designed by the pioneers of encrypted search with decades of research and experience in cryptography and uses NIST-standard cryptographic primitives such as AES-256, SHA2, and HMACs. Queryable Encryption allows a faster and easier development cycle — developers can easily encrypt sensitive data without making changes to their application code by using language-specific drivers provided by MongoDB. There is no crypto experience required and it’s intuitive and easy for developers to set up and use. Developers need not be cryptography experts to encrypt, format, and transmit the data. They don't have to figure out how to use the right algorithms or encryption options to implement a secure encryption solution. MongoDB has built a comprehensive encryption solution including key management. Queryable Encryption helps enterprises meet strict data privacy requirements such as HIPAA, GDPR, CCPA, PCI, and more using strong data protection techniques. It offers customer-managed and controlled keys. The MongoDB driver handles all cryptographic operations and communication with the customer-provisioned key provider . Queryable Encryption supports AWS KMS, Google Cloud KMS, Azure Key Vault, and KMIP-compliant key providers. MongoDB also provides APIs for key rotation and key migration that customers can leverage to make key management seamless. ** Equality query type is supported in 7.0 GA *With automation encryption enabled For more information on Queryable Encryption, refer to the following resources: Queryable Encryption documentation Queryable Encryption FAQ Download drivers Queryable Encryption Datasheet

September 18, 2023

MongoDB Announces Queryable Encryption with Equality Query Type Support

The general availability of Queryable Encryption offers end-to-end encryption of sensitive data while preserving the ability to run equality queries on that encrypted data, helping customers meet the strictest data privacy requirements. This technology allows developers to query encrypted sensitive data in a simple, intuitive way. We are releasing the equality query type with the 7.0 release and in future releases will add support to the range, prefix, suffix, and substring query types. First announced in preview in MongoDB 6.0 in 2022, Queryable Encryption introduced a fast state-of-the-art encrypted search algorithm using innovative cryptography engineering built and designed by MongoDB’s Cryptography Research Group with decades of experience designing state-of-the-art encrypted search algorithms. Since its initial release last year, MongoDB has worked in partnership with its customers including leading Fortune 500 companies in the healthcare and insurance industries to fine-tune the release for general availability. This client-side encryption approach uses novel encrypted data structures that allow developers to run efficient, expressive queries on encrypted workloads for the first time. Data remains encrypted at all times on the database, including in memory and in the CPU; keys never leave the application and cannot be accessed by the database server. Queryable Encryption: How it works Here is a sample flow of operations where an authorized user wants to query the encrypted data. In this example, let’s assume we are retrieving the records for an SSN number. Authorized users run an equality query to get specific SSN number records Recognizing the query is against an encrypted field, the driver requests the encryption keys from the customer-provisioned key provider, such as AWS Key Management Service (AWS KMS), Google Cloud KMS, Azure Key Vault, or any KMIP-enabled provider, such as HashiCorp Vault. The MongoDB driver gets the encryption keys from the key provider The driver submits the encrypted query along with a cryptographic token to the MongoDB server with the encrypted fields rendered as ciphertext. Queryable Encryption implements a fast encrypted search algorithm that allows the server to process queries on the encrypted data, without knowing the data. The data and the query itself remain encrypted at all times on the server. The MongoDB server returns the encrypted results of the query to the driver. The query results are decrypted with the keys held by the driver and returned to the client and shown as plaintext. Here are some of the key benefits of Queryable Encryption technology: Run equality queries on encrypted data: With Queryable Encryption, customers can run equality queries on encrypted data using a fast state-of-the-art encrypted search algorithm. This algorithm allows the server to process and retrieve matching documents without the server understanding anything about the data or why the document should be returned. Groundbreaking query technology based on standards-based cryptography: Queryable Encryption introduces a fast state-of-the-art encrypted search algorithm that uses NIST standards-based primitives. These are well-tested and established public standards to ensure the confidentiality and integrity of data. Faster application development cycle: Queryable Encryption allows developers to easily encrypt sensitive data without changes to their application code with many language-specific drivers to choose from. There is no crypto experience required and it’s intuitive and easy for developers to set up and use. Developers don't have to figure out how to use the right algorithms, encryption options, etc to implement their right encryption solution. MongoDB has done all that complex work for them. Reduce operational risk as sensitive workloads are protected on the cloud: Eliminate common security concerns when moving database workloads to the cloud. Customers can keep their data on any of the cloud providers and be assured that their data is protected. Since encryption keys are only accessible within the customer environment, the data cannot be decrypted by a 3rd party or the cloud provider. The only place where the data is unencrypted is in the application. Strong technical controls for critical data privacy use cases: Can help customers meet strict data privacy requirements such as HIPAA, GDPR, CCPA, PCI, and more. Queryable Encryption uses strong data protection techniques and end-to-end encryption. Resources For more information on Queryable Encryption, refer to the following resources: Queryable Encryption Documentation Queryable Encryption Quick Start Queryable Encryption FAQ Queryable Encryption Driver Compatibility

August 15, 2023

MongoDB Introduces Workforce Identity Federation with OpenID Connect Support for Database Access

Update June 5, 2024: Workforce Identity Federation is now GA. Head to our docs page to learn more. The workforce within organizations including DBAs, analysts, and developers need to authenticate and authorize the database to perform their job functions. Organizations need to manage the identity life cycle of these workforce users and enforce appropriate requirements such as password complexity, credential rotation, MFA, and so on. MongoDB supports LDAPS and AWS-IAM as two primary mechanisms for workforce access. LDAPS predates the cloud and requires organizations to establish network connectivity between their LDAP Server and MongoDB Atlas deployments. Workforce users can use AWS-IAM to authenticate with MongoDB Atlas deployments, but this mechanism is limited to AWS. MongoDB Atlas now supports workforce identity federation with the Atlas deployments using OpenID Connect (OIDC). OpenID Connect is a modern and open authentication protocol built on the OAuth 2.0 framework . This protocol is agnostic to a cloud provider. Any identity provider such as Okta, Azure AD, or Ping Identity that supports OIDC can be configured in Atlas for workforce authentication and authorization to MongoDB Atlas deployments. To use this feature, organizations configure OpenID Connect once in the Atlas Federation Management application and apply it to all deployments across Atlas projects. They also define access rights for the users in the corresponding Atlas Projects and map them to the groups defined in their identity provider. Workforce identity federation with OpenID Connect provides the following benefits: User credentials are centrally managed within your existing Identity Provider. MongoDB Atlas deployments never see or store the long-living credentials of your users. Security policies such as password rotation, password complexity, and MFA are centrally managed by your identity provider. Complete control over user lifecycle management in your organization that needs to access Atlas deployments. Enforce policies to have a short span of an access token in order to minimize the risk of long-living database connections. OpenID Connect support is currently in preview starting with MongoDB Atlas 7.0, releasing later this summer. OpenID Connect support is currently in preview starting with MongoDB 7.0 . Atlas Data Federation support Now, with a single setup, customers will be able to access Atlas Data Federation through Shell and Compass using OpenID Connect authentication, enabling it for both dedicated clusters and Data Federation. Refer to the documentation for more details. Try it with the 7.0 RC in Atlas .

June 30, 2023

MongoDB Releases Queryable Encryption Preview

Today we are announcing the Preview release of Queryable Encryption , which allows customers to encrypt sensitive data from the client side, store it as fully randomized encrypted data on the database server side, and run expressive queries on the encrypted data. With the introduction of Queryable Encryption, MongoDB is the only database provider that allows customers to run expressive queries, such as equality (available now in preview) and range, prefix, suffix, substring, and more (coming soon) on fully randomized encrypted data. This is a huge advantage for organizations that need to run expressive queries while also confidently securing their data. Why is Queryable Encryption an important technology? With the proliferation of different types of data being transmitted and stored in the cloud, protecting data is increasingly important for companies. Enterprises with high-sensitivity workloads require additional technical options to control and limit access to confidential and regulated data. For many enterprise and federal customers, compliance obligations dictate that the sensitivity of certain workloads requires the separation of duties of personnel. For example, analysts at a stock brokerage firm may query to find clients and the number of shares, the broker can make stock transactions on behalf of the investor, and database administrators (DBAs) manage the data, while the sensitive and personally identifiable information (PII), such as social security number (SSN), should be completely hidden. Another important focus area for organizations is complying with data privacy and customer data protection mandates. This applies both to customers who use the data, and vendors who store the data for them. Data privacy regulations can involve complying with laws within and outside your industry that help protect sensitive data. Making sure that you are following all necessary measures to protect your customers’ most sensitive data is a process. Data protection and privacy are typically applied to high-sensitivity information, such as personal health information (PHI) and PII. Current state and challenges around data security Although existing encryption solutions (in-transit and at-rest) cover many regulatory use cases, none of them protects sensitive data while it is in use. In-use data encryption often is a requirement for high-sensitivity workloads for customers in financial services, healthcare, and critical infrastructure organizations. Currently, challenges around in-use encryption technologies include: In-use encryption is highly complex, involving custom code from the application side in order to encrypt, process, filter, and decrypt the data to show it to the users. It also involves managing encryption keys in order to encrypt/decrypt the data. Developers need cryptography experience in order to design a secure encryption solution. Current solutions have limited or no querying capabilities, which makes using encrypted data in applications difficult. Some of the existing tools, such as homomorphic encryption or secure enclaves have performance unsuited to scalable encrypted search, require proprietary hardware, or have uncertain security properties. Introducing Queryable Encryption Queryable Encryption removes operational heavy-lifting, resulting in faster app development without sacrificing data protection, compliance, and data privacy security requirements. Here is a sample flow of operations in which an authenticated user wants to query the data, but now the user is able to query on fully randomly encrypted data. In this example, let’s assume we are retrieving the SSN number of a user. When the application submits the query, MongoDB drivers first analyze the query. Recognizing the query is against an encrypted field, the driver requests the encryption keys from the customer-provisioned key provider, such as AWS Key Management Service (AWS KMS), Google Cloud KMS, Azure Key Vault, or any KMIP-enabled provider, such as HashiCorp Vault. The driver submits the query to the MongoDB server with the encrypted fields rendered as ciphertext. Queryable Encryption implements a fast, searchable scheme that allows the server to process queries on fully encrypted data, without knowing anything about the data. The data and the query itself remain encrypted at all times on the server. The MongoDB server returns the encrypted results of the query to the driver. The query results are decrypted with the keys held by the driver and returned to the client and shown as plaintext. Advantages of Queryable Encryption Rich querying capabilities on encrypted data: MongoDB is the only database provider that allows customers to run rich query expressions like range, equality, prefix, suffix, and more on encrypted data. (equality search is in the Preview release and the rest will follow in future releases) This is a huge advantage for customers as they can run expressive queries while securing their data confidently. Data encrypted throughout its lifecycle: Queryable Encryption adds another layer of security for your most sensitive data, where data remains secure in-transit, at-rest, in memory, in logs, and in backups. Additionally, Queryable Encryption encrypts data as fully randomized on the server-side. Strong technical controls for critical data privacy use cases: Strong technical controls allow customers to meet the strictest data privacy requirements for confidentiality and integrity using standards-based cryptography. Customers maintain control of encryption keys at all times, and data encryption/decryption happens only on the client-side. This guarantees that only authorized users with access to the client-side application and the encryption keys are able to see the plaintext data. These strong controls can help customers meet data privacy requirements mandated by HIPAA, GDPR, CCPA, and more. Faster application development: Developers don't need to be experts in cryptography to protect data with the highest levels of confidentiality and integrity. Unlike an SDK, where the wrong design choice could lead to weakened security, Queryable Encryption is a comprehensive encryption solution using standard-based cryptography and strong key management built-in. It is easy to set up and is supported on popular MongoDB drivers. Reduce institutional risk: Customers who are migrating to the cloud can confidently store their more sensitive data in MongoDB Atlas. Queryable Encryption allows customers to maintain control of their data while allowing rich, expressive querying capabilities on fully randomized encrypted data. MongoDB enables strong security defaults to ensure that security configurations such as authentication, authorization, in-transit and at-rest encryption are always on, to make it easy for customers to develop and focus on their business needs. Queryable Encryption adds another layer of security, which is a strong form of technical control enabling our customers to protect data throughout its lifecycle, and you’ll have the ability to run rich queries on the encrypted data. Advanced Cryptography Research Group Queryable Encryption was designed by MongoDB’s Advanced Cryptography Research Group, headed by Seny Kamara and Tarik Moataz, who are pioneers in the field of encrypted search. The Group conducts cutting-edge peer-reviewed research in cryptography and works with MongoDB engineering teams to transfer and deploy the latest innovations in cryptography and privacy to the MongoDB data platform. Resources For more information on Queryable Encryption, refer to the following resources: MongoDB’s Queryable Encryption MongoDB Documentation MongoDB Atlas Security Controls

June 7, 2022