December 18, 2023
Atlas serverless instances now offer auto-index creation, a new capability that automatically generates indexes to help optimize performance and reduce the cost of your queries.
Auto-index creation is now available in public preview and enabled by default for all serverless instance deployments - allowing developers to worry less about needing to manually optimize their serverless database.
Simplify development with Atlas serverless instances
Developers love serverless technology primarily because of its unparalleled ease of use. By abstracting away infrastructure management, serverless allows developers to focus on what they do best: writing code and building amazing applications. It’s expected that any great serverless offering just works out of the box, without a large learning curve or emphasis on implementation and setup.
Atlas serverless instances, first announced as generally available in June 2022, deliver on this promise by allowing you to deploy a database that seamlessly scales with demand in seconds with minimal configuration and a consumption-based pricing model that only charges for what you use.
The addition of auto-index creation now further reduces management overhead by automating index creation for common queries to ensure fast response time.
How auto-indexing works
Indexes are special data structures that store a small portion of the collection's data set in an easy-to-traverse form. Without indexes, MongoDB must perform a collection scan—i.e., scan every document in a collection—to select those documents that match the query statement. By adding an index to appropriate queries, you can dramatically reduce the number of documents the query engine must inspect in order to return a result and in turn benefit from improved query performance and a reduction in the read operations you are charged for.
With auto-index creation enabled, Atlas will analyze your recent query workload and automatically create high-impact indexes based on index suggestions in the Performance Advisor. This helps promote good index hygiene for your data by creating high-impact indexes without requiring you to regularly check for suggestions or create indexes manually.
You can view newly created indexes in the Atlas UI in the Collections tab of the Data Explorer.
You can also continue to manually add additional indexes in the Collections tab or via the Performance Advisor at any time. To learn more about auto-index creation, visit our documentation.
Create a serverless instance in Atlas today.
Data Governance for Building Generative AI Applications with MongoDB
Generative AI (GenAI) has been evolving at a rapid pace. With the introduction of OpenAI’s ChatGPT powered by GPT-3.5 reaching 100 million monthly active users in just two months, other major large language models (LLMs) have followed in ChatGPT's footsteps. Cohere’s LLM supports more than 100 languages and is now available on their AI platform, Google’s Med-PaLM was designed to provide high-quality answers to medical questions, OpenAI introduced GPT-4 (a 40% improvement over GPT-3.5), Microsoft integrated GPT-4 within its Office 365 suite, and Amazon introduced Bedrock , a fully managed service that makes foundation models available via API. These are just a few advancements in the Generative AI market, and a lot of enterprises and startups are adopting AI tools to solve their specific use cases. The developer community and open-source models are also growing as companies adapt to the new technology paradigm shift in the market. Building intelligent GenAI applications requires flexibility with data. One of the core requirements is data governance , which will be discussed in this blog. Data governance is a broad term encompassing everything you do to ensure data is secure, private, accurate, available, and usable. It includes the processes, policies, measures, technology, tools, and controls around the data lifecycle. When organizations build applications and transition to a production environment, they often deal with personal data (PII) or commercially sensitive data, such as data related to intellectual property, and want to make sure all the controls are in place. When organizations are looking to build GenAI-powered apps, there are a few capabilities that are required to deliver intelligent and modern app experiences: Handle data for both operational and analytical workloads A data platform that is highly scalable and performant An expressive query API that can work with any kind of data type Tight integrations with established and open-source LLMs Native vector search capabilities like embeddings that enable semantic search and retrieval-augmented generation (RAG) To learn more about the MongoDB developer data platform and how to embed generative AI applications with MongoDB, you can refer to this paper . This blog goes into detail on the security controls of MongoDB Atlas that modern AI applications need. Check out our AI resource page to learn more about building AI-powered apps with MongoDB. What are some of the potential security risks while building GenAI applications? As per the recent State of AI, 2023 report by Retool, data security and data accuracy are the top two pain points when developing AI applications. In the survey, a third of respondents cited data security as a primary pain point, and it increases almost linearly with company size (refer to the MongoDB blog for more details.) Top pain points around developing AI apps. Source: State of AI 2023 report by Retool While organizations leverage AI technology to improve their businesses, they should be wary of the potential risks. The unintended consequences of generative AI are more likely to expose the above risks as companies approach experimenting with various models and AI tools. Although organizations follow best practices to be deliberate and structured in developing production-ready generative AI applications, they need to have strict security controls in place to alleviate the key security considerations that AI applications pose. Here are some considerations for securing AI applications/systems Data security and privacy: Generative AI foundation models rely on large amounts of data to both train against and generate new content. If the training data or data available for the RAG process (retrieval augmented generation) includes personal or confidential data, that data may turn up in outputs in unpredictable ways. Hence it is very important to have strong governance and controls in place so that confidential data does not wind up in outputs. Intellectual property infringement: Organizations need to avoid the unauthorized use, duplication, or sale of works legally regarded as protected intellectual property. They also have to make sure to train the AI models so the output does not resemble existing works and hence infringe the copyrights of the original. Since this is still a new area for AI systems, the laws are evolving. Regulatory compliance: AI applications have to comply with industry standards and policies like HIPAA in healthcare, PCI in finance, GDPR for data protection for EU citizens, CCPA, and more. Explainability: AI systems and algorithms are sometimes perceived as opaque, making non-deterministic decisions. Explainability is the concept that a machine learning model and its output can be explained in a way that makes sense to a human being at an acceptable level and provides repeatable outputs given the same inputs. This is crucial for building trust and accountability in AI applications, especially in domains like healthcare, finance, and security. AI Hallucinations: AI models may generate inaccurate information, also known as hallucinations. These are often caused by limitations in training data and algorithms. Hallucinations can result in regulatory violations in industries like finance, healthcare, and insurance, and, in the case of individuals, could be reputationally damaging or even defamatory. These are just some of the considerations when using AI tools and systems. There are additional concerns when it comes to physical security, organizational measures, technical controls for the workforce — both internal and partners — and monitoring and auditing of the systems. By addressing each of these critical issues, organizations can ensure the AI applications they roll out to production are compliant and secure. Let us look at how MongoDB’s developer data platform can help with some of these considerations around security controls and measures. How does MongoDB address the security risks and data governance around GenAI? MongoDB's developer data platform, built on MongoDB Atlas , unifies operational, analytical, and generative AI data services to streamline building intelligent applications. At the core of MongoDB Atlas is its flexible document data model and developer-native query API. Together, they enable developers to dramatically accelerate the speed of innovation, outpace competitors, and capitalize on new market opportunities presented by GenAI. Developers and data science teams around the world are innovating with AI-powered applications on top of MongoDB. They span multiple use cases in various industry sectors and rely on the security controls MongoDB Atlas provides. Here is the library of sample case studies, white papers, and other resources about how MongoDB is helping customers build AI-powered applications. MongoDB security & compliance capabilities MongoDB Atlas offers built-in security controls for all organizational data. The data can be application data as well as vector embeddings and their associated metadata — giving holistic protection of all the data you are using for GenAI-powered applications. Atlas enables enterprise-grade features to integrate with your existing security protocols and compliance standards. In addition, Atlas simplifies deploying and managing your databases while offering the versatility for developers to build resilient applications. MongoDB allows easy integration for security administrators with external systems, while developers can focus on their business requirements. Along with key security features being enabled by default, MongoDB Atlas is designed with security controls that meet enterprise security requirements. Here's how these controls help organizations build their AI applications on MongoDB’s platform and meet the considerations we discussed above: Data security MongoDB has access and authentication controls enabled by default. Customers can authenticate to the platform using mechanisms including SCRAM, x.509 certificates, LDAP, passwordless authentication with AWS-IAM, and OpenID Connect. MongoDB also provides role-based access control (RBAC) to determine the user's access privilege to various resources within the platform. Data scientists and developers building AI applications can leverage any of these access controls to fine-tune user access and privileges while training or prompting their AI models. Organizations can implement access control mechanisms to restrict access to the data to only authorized personnel. End-to-end encryption of data: MongoDB’s data encryption tools offer robust features to protect your data while in transit (network), at rest (storage), and in use (memory and logs). Customers can use automatic encryption of key data fields like personally identifiable information (PII), protected health information (PHI), or any data deemed sensitive, ensuring data is encrypted throughout its lifecycle. Going beyond encryption at rest and in transit, MongoDB has released Queryable Encryption to encrypt data in use. Queryable Encryption enables an application to encrypt sensitive data from the client side, store the encrypted data in the MongoDB database, and run server-side queries on the encrypted data without having to decrypt it. Queryable Encryption is an excellent anonymization technique that makes sensitive data opaque. This technology can be leveraged when you are using company-specific data that contain confidential information from the MongoDB database for the RAG process and that data needs to be anonymized or when you are storing sensitive data in the database. Regulatory compliance and data privacy Many uses of generative AI are subject to existing laws and regulations that govern data privacy, intellectual property, and other related areas. New laws and regulations aimed specifically at AI are in the works around the world. The MongoDB developer data platform undergoes independent verification of platform security, privacy, and compliance controls to help customers meet their regulatory and policy objectives, including the unique compliance needs of highly regulated industries and U.S. government agencies. Refer to the MongoDB Atlas Trust Center for our current certifications and assessments. Regular security audits Organizations should conduct regular security audits to identify potential vulnerabilities in their data security practices. This can help ensure that any security weaknesses are identified and addressed promptly. Audits help to identify and mitigate any risks and errors in your AI models and data, as well as ensure that you are compliant with regulations and standards. MongoDB offers granular auditing that provides a trail of how and what data was used and is designed to monitor and detect any unauthorized access to data. What are additional best practices and considerations while working with AI models? While it is essential to work with a trusted data platform, it is also important to prioritize security and data governance as discussed. In addition to data security , compliance , and data privacy as mentioned above, here are additional best practices and considerations. Data quality Monitor and assess the quality of input data to avoid biases in foundation models. Make sure that your training data is representative of the domain in which your model will be applied. If your model is expected to generalize to real-world scenarios, your training data or data made available for the RAG process should be monitored. Secure deployment Use secure and encrypted channels for deploying foundation models. Implement robust authentication and authorization mechanisms to ensure that only authorized users and systems can access sensitive data and AI models. Enforce mechanisms to anonymize sensitive information to protect user privacy. Audit trails and monitoring Maintain detailed audit trails and logs of model training, evaluation, and deployment activities. Implement continuous monitoring of both data inputs and model outputs for unexpected patterns or deviations. MongoDB maintains audit trails and logs of all the data operations and data processing. Customers can use the audit logs for monitoring, troubleshooting, and security purposes, including intrusion detection. We utilize a combination of automated scanning, automated alerting, and human review to monitor the data. Secure data storage Implement secure storage practices for both raw and processed data. Use encryption for data at rest and in transit as discussed above. Encryption at-rest is turned on automatically on MongoDB servers. The encryption occurs transparently in the storage layer; i.e. all data files are fully encrypted from a filesystem perspective, and data only exists in an unencrypted state in memory and during transmission. Conclusion As generative AI tools grow in popularity, it matters more than ever how an organization understands and protects its data, and puts it to use — defining the roles, controls, processes, and policies for interacting with data. As modern enterprises use generative AI and LLMs to better serve customers and extract insights from the data, strong data governance becomes essential. By understanding the potential risks and carefully evaluating the platform capabilities the data is hosted on, organizations can confidently harness the power of these tools. For more details on MongoDB’s trusted platform, refer to these links. MongoDB Security Hub Platform Trust Center Atlas Technical and Organization Security Measures MongoDB Compliance & Assessments MongoDB Data Privacy
Aussie Fintech Monoova Leads the Way on “Multi” “Cloud” (Not “Multi-Cloud”), to Solve Data Security and Compliance Conundrums
Monoova is a fast-growing Australian Fintech scale-up providing real-time payment solutions to businesses. Having grown from 200,000 to 6 million business accounts in 5 years, Monoova has also created 13% of Australia’s PayID since the inception of NPP, and processed over $100 billion in payments. Dealing with critical financial information from hundreds of organisations in highly regulated industries means Monoova needs to put data security and compliance first. But this is easier said than done. Navigating an increasingly complex security and compliance landscape The increased reliance on the cloud, combined with more regulations requiring extra resilience capabilities, means that financial services organisations are facing increasingly complex data security and compliance challenges. APRA’s chair John Lonsdale recently warned the financial sector about cybersecurity non-compliance, mentioning CPS 230 as one of the upcoming regulations organisations should start preparing to comply with ahead of the July 2025 deadline. On November 3rd, 2023, Brad Jones, Assistant Governor of the RBA gave a speech in which he listed the “Outside Operational Risk (Cloud Concentration Risk)” as one of the main threats to financial stability . “Multi” “cloud” or “multi-cloud”? A significant number of financial services institutions today aren’t using multi-cloud in a way that would make them resilient in the event of a data security or outage issue. Many say that they are using “multi-cloud” but what they are doing is hosting individual workloads and data sets in different clouds, which doesn’t provide full resilience and data security or meet requirements from regulating industry bodies and government. True multi-cloud resilience means having critical data hosted in different clouds at the same time. For Monoova’s CTO, Nicholas Tan, future-proofing compliance and data security lies in adopting a true multi-cloud approach, and this is exactly the path Monoova has taken by working with MongoDB Atlas. “Whether it’s in critical sectors like financial services or telecommunications, time and time again we see events such as outages seriously impact Australian organisations and their sometimes millions of users - the Optus outage from November 2023 is a perfect example," explains Nicholas Tan. “There are great operational risks of not having diversity in an organisations’ core infrastructure, and this is why building real resilience with a proper multi-cloud approach should be a no-brainer.” Working with MongoDB Atlas a game changer MongoDB Atlas, which is the operational database underpinning all of Monoova’s services, was chosen by Monoova to support extra scale requirements as it was - and still is fast growing, as well as empower developer productivity as the company needs to bring new products to market and innovate fast. Another key driver in working with MongoDB Atlas was the unique functionality that enables users to simply “turn a switch on” to automatically enable selected critical data and workloads to go multi-cloud, allowing data to be easily distributed across different clouds and providing resilience and protection for the workload. Tan and his team simply have to choose the UI to reconfigure their posture to be multi-cloud, which takes only a few minutes. Monoova’s MongoDB Atlas multi-cloud console According to Tan: “Our multi-cloud approach is pretty unique in the current Australian financial services landscape, and this is what has set Monoova to be one of the first Australian financial services organisations to align with the CPS230 framework, as well as at the forefront of ensuring compliance and resilience in an environment heavily reliant on third parties.” “Working with MongoDB has been a game changer because it means we were able to quickly scale up at a fraction of the manpower required; it saved us from recruiting a full team if we had had to do all that multi-cloud work in-house.” Learn more about MongoDB Atlas and our multi-cloud feature on our resources page .