Building AI with MongoDB: Cultivating Trust with Data

Mat Keep
October 3, 2023 | Updated: April 21, 2025

“Trust is like the air we breathe – when it’s present, nobody really notices; when it’s absent, everybody notices.” - Warren Buffett

The issue of trust is one that dominates discussions around the safe and responsible adoption of AI across business and society. It was another Warren - this time Warren Bennis, a pioneer in modern leadership principles – who was attributed as saying "Trust is the lubrication that makes it possible for organizations to work." Particularly relevant when we think about how organizations are starting to embed AI into the very fabric of their businesses.

On one hand, we have governments around the world that are at varying stages of regulating their way to trustworthy AI. However, this will not be a quick process, and enterprises can’t afford to wait. Businesses need to make progress now if they are going to unlock the opportunities presented by AI.

In our latest roundup of AI innovators building with MongoDB, we’re going to focus on three companies tackling trust from different angles. We feature Nomic who are working to make AI more explainable. Robust Intelligence is focused on securing AI models against prompt injections, data poisoning, bias, PII leakage, and more. Finally, VISO TRUST comes at this issue from a totally different perspective. They use AI to help their customers reduce cybersecurity risks and improve trust across the supply chain.

Let's dig in.

Check out our AI Learning Hub to learn more about building AI-powered apps with MongoDB.

Making AI explainable and accessible

Despite the huge advances in AI and its use in almost every industry, very little is known about how the most popular models actually work. What data are they trained on? What are they learning? How can we compare accuracy between different models? These are the questions Nomic AI is seeking to help us answer through its Atlas and GPT4All products.

Nomic Atlas is a data engine that allows users to explore, label, search, share, and build on massive datasets using their web browser. With Atlas, users can begin to understand what data their chosen AI models are learning from and the associations they are making during the training phase. Atlas can be used for exploratory data analysis, data labeling and cleansing, and visualizations of vector embeddings.

To see Nomic Atlas in action, take a look at the recent blog post with Hugging Face announcing IDEFICS, an open-access reproduction of the visual language model based on Flamingo. The model takes image and text inputs and produces text outputs from them. For example, it can answer questions about images, describe visual content, and create stories grounded in multiple images. Nomic allows users to visually explore the content of the training data, as illustrated in the image below.

Screenshot of from Nomic's Blog. Showcasing how Nomic's platform can help you visually explore content

Atlas can be used to curate high-quality training and instruction-tuned datasets for the GPT4All models. Nomic GPT4All is an ecosystem for training and deploying powerful and customized large language models that run locally on consumer-grade CPUs in Windows, Mac, and Ubuntu Linux clients. With GPT4All, users have access to a free-to-use, locally running, privacy-aware chatbot that doesn’t require expensive and scarce GPUs to train and infer on, or an internet connection. It can power question-answering systems, personal writing assistants, document summarization, and code generation. Demand for GPT4All has been explosive, accruing more than 20,000 GitHub stars within its first week of launch.

“Every month MongoDB is adding hundreds of organizations and thousands of developers who are building AI-enabled apps on its multi-cloud developer data platform,” said Brandon Duderstadt, CEO of Nomic. “It makes sense for us to partner with MongoDB Ventures. They are helping us accelerate our vision of making AI explainable and accessible to everyone.”

Update, February 6th 2024:

On February 1, 2024, Nomic released its Nomic Embed open-source embedding model and a fully managed inference endpoint. This allows anyone to build their own powerful RAG applications for generative AI using a text embedding model with a 8,192 context-length that outperforms proprietary alternatives on a variety of benchmarks.

To demonstrate its new endpoint and model in action, the Nomic engineers created the Building a RAG LLM with Nomic Embed and MongoDB. By following the blog post, you will learn:

How to use Nomic to generate embeddings for your data sources.
Add them to MongoDB Atlas Vector Search. (Note that this runs in the Atlas free tier, so there is no cost to you!)
Use an open-source LLM to generate text from your retrieved documents.

Because you have access to the code and data behind the Nomic Embed model, you can easily customize it for even better performance.

Securing generative AI, supercharged by your data

Robust Intelligence delivers end-to-end AI risk management to protect organizations from security, ethical, and operational risks. The company’s platform automates testing and compliance across the AI lifecycle through continuous validation and protects models in real-time with AI Firewall. This combined approach enables Robust Intelligence to proactively manage risk for any model type, including generative AI and gives organizations the confidence to unleash the true potential of AI. Robust Intelligence is trusted by leading companies including ADP, JPMorgan Chase, Expedia, Deloitte, PwC, and the U.S. Department of Defense.

Recent advancements in generative AI have motivated companies to experiment with potential applications, but a lack of security controls has exposed companies to unmanaged risks. This challenge is exacerbated when sensitive company information is used to enrich pre-trained models, such as connecting vector databases, in order to increase the relevance to the end user.

Robust Intelligence’s AI Firewall protects large language models (LLMs) in production by validating inputs and outputs in real-time. It assesses and mitigates operational risks such as hallucinations; ethical risks, including model bias and toxic outputs; and security risks such as prompt injections and PII extraction. AI Firewall stops bad or malicious inputs from reaching AI models and prevents undesired AI-generated results from reaching the application.

Customers can confidently connect MongoDB Atlas Vector Search to any commercial or open-source LLM for secure retrieval-augmented generation with the AI Firewall integration. Atlas Vector Search serves as the memory and fact database for AI Firewall, ensuring the AI model provides enriched responses without hallucinating. Additionally, it serves as the memory and database to store historical data points. This is important in the context of identifying more advanced security attacks, such as data poisoning and model extraction, which often manifest across a cluster of data points as opposed to a single data point.

Yaron Singer, CEO and co-founder at Robust Intelligence commented “By incorporating MongoDB’s Atlas Vector Search into the AI validation process, customers can confidently use their databases to enhance LLM responses knowing that sensitive information will remain secure. The integration provides seamless protection against a comprehensive set of security, ethical, and operational risks.”

Graphic showing the flow of information into and from the Vector Search, core and metadata store.

Being part of the MongoDB Partner Program provides Robust Intelligence with access to specialist technical support to optimize product integrations and provides visibility to the MongoDB customer base.

Transforming cyber risk intelligence

VISO TRUST is an AI-powered third-party cyber risk and trust platform that enables any company to access actionable vendor security information in minutes. VISO TRUST delivers fast and accurate intelligence needed to make informed cybersecurity risk decisions at scale. Today VISO TRUST has many great enterprise customers like InstaCart, Gusto, and Upwork and they all say the same thing: 90% less work, 80% reduction in time to assess risk, and near 100% vendor adoption.

How does VISO TRUST achieve these results? Pierce Lamb, Senior Software Engineer on the Data and Machine Learning team at VISO TRUST provides more detail:

“VISO TRUST Platform easily engages third parties, saving everyone time and resources. In a 5-minute web-based session, third parties are prompted to upload relevant artifacts of the security program that already exists, and our supervised AI – which we call Artifact Intelligence – does the rest.

First, VISO TRUST deploys discriminator models that produce high-confidence predictions about features of the artifact.
Secondly, artifacts have text content parsed out of them which we embed and store in MongoDB Atlas to become part of our dense retrieval system. This dense retrieval system performs Retrieval-Augmented Generation (RAG) using MongoDB features like Atlas Vector Search to provide ranked context to large language model (LLM) prompts.
Thirdly, we use RAG results to seed LLM prompts and chain together their outputs to produce extremely accurate factual information about the artifact in the pipeline. This information is able to provide instant intelligence to customers that previously took weeks to produce.”

Screenshot of the VISO Trust dashboard displaying analytical insights

VISO TRUST is the only SaaS third-party cyber risk management platform that delivers the rapid security intelligence needed for modern companies to make critical risk decisions early in the procurement process

VISO TRUST uses state-of-the-art models from OpenAI, Hugging Face, Anthropic, Google, and AWS, augmented by vector search and retrieval from MongoDB Atlas. Read our interview blog post with VISO TRUST to learn more.

What's next?

If you are getting started with building AI-enabled apps on MongoDB, sign up for our AI Innovators Program. Successful applicants get access to expert technical advice, free MongoDB Atlas credits, co-marketing opportunities, and – for eligible startups, introductions to potential venture investors.

In the spirit of "Trust, but verify" (Ronald Reagan), if you’re not sure how the program or indeed, MongoDB, could deliver value to you, take a look at earlier blog posts in this series:

Building AI with MongoDB: first qualifiers include AI at the network edge for computer vision and augmented reality; risk modeling for public safety; and predictive maintenance paired with Question-answer generation for maritime operators.
Building AI with MongoDB: compliance to copilots features AI in healthcare along with intelligent assistants that help product managers specify better products and help sales teams compose emails that convert 2x higher.
Building AI with MongoDB: unlocking value from multimodal data showcases open source libraries that transform unstructured data into a usable JSON format; entity extraction for contracts management; and making sense of “dark data” to build customer service apps.

You should look at the MongoDB for Artificial Intelligence resources page for the latest best practices that get you started in turning your idea into an AI-driven reality.

← Previous

Melhores práticas de desempenho: indexação

Bem-vindo ao terceiro de nossa série de postagens de blog que abordam as práticas recomendadas de desempenho para MongoDB. Nesta série, abordamos as principais considerações para alcançar o desempenho em escala em uma série de dimensões importantes, incluindo: Modelagem de dados e dimensionamento de memória (o conjunto de trabalho) Padrões de consulta e criação de perfil Indexação, que abordaremos hoje Fragmentação Transações e preocupações de leitura/gravação Configuração de hardware e sistema operacional Aquecimento de bancada Tendo ambos trabalhado para alguns fornecedores de bancos de dados diferentes nos últimos 15 anos, podemos dizer com segurança que não definir os índices apropriados é o principal problema de desempenho que as equipes de suporte técnico precisam resolver com os usuários. Portanto, precisamos acertar… aqui estão as melhores práticas para ajudá-lo. Índices no MongoDB Em qualquer banco de dados, os índices suportam a execução eficiente de consultas. Sem eles, o banco de dados deve examinar todos os documentos de uma collection ou tabela para selecionar aqueles que correspondem à instrução da consulta. Se existir um índice apropriado para uma consulta, o banco de dados poderá usar o índice para limitar o número de documentos que deve inspecionar. O MongoDB oferece uma ampla variedade de tipos de índices e recursos com ordens de classificação específicas de linguagem para oferecer suporte a padrões de acesso complexos aos seus dados. Os índices MongoDB podem ser criados e eliminados sob demanda para acomodar requisitos de aplicativos e padrões de consulta em evolução e podem ser declarados em qualquer campo de seus documentos, incluindo campos aninhados em matrizes. Então, vamos abordar como você faz o melhor uso dos índices no MongoDB. Use índices compostos Índices compostos são índices compostos por vários campos diferentes. Por exemplo, em vez de ter um índice em "Sobrenome" e outro em "Nome", normalmente é mais eficiente criar um índice que inclua "Sobrenome" e "Nome" se você consultar ambos os nomes. . Nosso índice composto ainda pode ser usado para filtrar consultas que especificam apenas o sobrenome. Siga a regra ESR Para índices compostos, esta regra prática é útil para decidir a ordem dos campos no índice: Primeiro, adicione os campos nos quais as consultas de igualdade são executadas Os próximos campos a serem indexados devem refletir a ordem de classificação da consulta Os últimos campos representam o intervalo de dados a serem acessados Use consultas cobertas quando possível As consultas cobertas retornam resultados diretamente de um índice, sem precisar acessar os documentos de origem e, portanto, são muito eficientes. Para que uma consulta seja coberta todos os campos necessários para filtrar, ordenar e/ou retornar ao cliente devem estar presentes em um índice. Para determinar se uma consulta é coberta, use o método explain() . Se a saída de explain() exibir totalDocsExamined como 0, isso mostra que a consulta é coberta por um índice. Leia mais na documentação para explicar os resultados . Um problema comum ao tentar obter consultas cobertas é que o campo ID é sempre retornado por padrão. Você precisa excluí-lo explicitamente dos resultados da consulta ou adicioná-lo ao índice. Em clusters fragmentados, o MongoDB precisa acessar internamente os campos da chave do fragmento. Isso significa que as consultas cobertas só são possíveis quando a chave de fragmento faz parte do índice. Geralmente é uma boa ideia fazer isso de qualquer maneira. Tenha cuidado ao considerar índices em campos de baixa cardinalidade Consultas em campos com um pequeno número de valores exclusivos (baixa cardinalidade) podem retornar grandes conjuntos de resultados. Os índices compostos podem incluir campos com baixa cardinalidade, mas o valor dos campos combinados deve apresentar alta cardinalidade. Elimine índices desnecessários Os índices consomem muitos recursos: mesmo com compactação no mecanismo de armazenamento MongoDB WiredTiger, eles consomem RAM e disco. À medida que os campos são atualizados, os índices associados devem ser mantidos, incorrendo em sobrecarga adicional de CPU e E/S de disco. O MongoDB fornece ferramentas para ajudá-lo a entender o uso do índice, que abordaremos mais adiante nesta postagem. Os índices curinga não substituem o planejamento de índices baseado em carga de trabalho Para cargas de trabalho com muitos padrões de consulta ad hoc ou que lidam com estruturas de documentos altamente polimórficas, os índices curinga oferecem muita flexibilidade extra. Você pode definir um filtro que indexe automaticamente todos os campos, subdocumentos e matrizes correspondentes em uma collection. Como acontece com qualquer índice, eles também precisam ser armazenados e mantidos, portanto, adicionarão sobrecarga ao banco de dados. Se os padrões de consulta do seu aplicativo forem conhecidos antecipadamente, você deverá usar índices mais seletivos nos campos específicos acessados pelas consultas. Use a pesquisa de texto para combinar palavras dentro de um campo Os índices regulares são úteis para combinar o valor inteiro de um campo. Se você deseja corresponder apenas uma palavra específica em um campo com muito texto, use um índice de texto . Se você estiver executando o MongoDB no serviço Atlas, considere usar o Atlas Full Text Search , que fornece um índice Lucene totalmentemanaged e integrado ao banco de dados MongoDB. O FTS oferece maior desempenho e maior flexibilidade para filtrar, classificar e classificar seu banco de dados para exibir rapidamente os resultados mais relevantes para seus usuários. Use índices parciais Reduza o tamanho e a sobrecarga de desempenho dos índices incluindo apenas os documentos que serão acessados por meio do índice. Por exemplo, crie um índice parcial no campo orderID que inclua apenas documentos de pedido com um orderStatus de "Em andamento" ou indexe apenas o campo emailAddress para documentos onde ele existir. Aproveite as vantagens dos índices multichave para consultar matrizes Se seus padrões de consulta exigirem acesso a elementos individuais da matriz, use um índice multichave . O MongoDB cria uma chave de índice para cada elemento do array e pode ser construído sobre arrays que contêm valores escalares e documentos aninhados. Evite expressões regulares que não estejam ancoradas ou enraizadas Os índices são ordenados por valor. Os curingas iniciais são ineficientes e podem resultar em varreduras completas do índice. Os curingas finais podem ser eficientes se houver caracteres iniciais que diferenciam maiúsculas de minúsculas suficientes na expressão. Evite expressões regulares que não diferenciam maiúsculas de minúsculas Se o único motivo para usar um regex for a insensibilidade a maiúsculas e minúsculas, use um índice que não diferencia maiúsculas de minúsculas , pois eles são mais rápidos. Use otimizações de índice disponíveis no mecanismo de armazenamento WiredTiger Se você estiver autogerenciando o MongoDB, poderá opcionalmente colocar índices em seu próprio volume separado, permitindo paginação de disco mais rápida e menor contenção. Consulte as opções WiredTiger para obter mais informações. Use o Plano Explicar Abordamos o uso do plano de explicação do MongoDB na postagem anterior sobre padrões de consulta e criação de perfil, e esta é a melhor ferramenta para verificar a cobertura do índice para consultas individuais. Trabalhando a partir do plano de explicação, o MongoDB fornece ferramentas de visualização para ajudar a melhorar ainda mais a compreensão de seus índices e fornece recomendações inteligentes e automáticas sobre quais índices adicionar. Visualize a cobertura do índice com MongoDB Compass e Atlas Data Explorer Como a GUI gratuita do MongoDB Compass oferece muitos recursos para ajudá-lo a otimizar o desempenho da consulta, incluindo a exploração do seu esquema e a visualização dos planos de explicação da consulta – duas áreas abordadas anteriormente nesta série. A guia de índices do Compass adiciona outra ferramenta ao seu arsenal. Ele lista os índices existentes para uma collection, informando o nome e as chaves do índice, juntamente com seu tipo, tamanho e quaisquer propriedades especiais. Através da guia de índice você também pode adicionar e eliminar índices conforme necessário. Um recurso realmente útil é o uso do índice, que mostra com que frequência um índice foi usado. Ter muitos índices pode ser quase tão prejudicial ao seu desempenho quanto ter poucos, tornando esse recurso especialmente valioso para ajudá-lo a identificar e remover índices que não estão sendo usados. Isso ajuda a liberar espaço no conjunto de trabalho e elimina a sobrecarga do banco de dados resultante da manutenção do índice. Se você estiver executando o MongoDB em nosso serviço Atlas totalmentemanaged , a visualização dos índices no Data Explorer lhe dará a mesma funcionalidade do Compass, sem que você precise se conectar ao seu banco de dados com uma ferramenta separada. Você também pode recuperar estatísticas de índice usando o estágio aggregation pipeline $indexStats . Recomendações de índice automatizado Mesmo com toda a telemetria fornecida pelas ferramentas do MongoDB, você ainda é responsável por extrair e analisar os dados necessários para tomar decisões sobre quais índices adicionar. O limite para consultas lentas varia com base no tempo médio de operações no seu cluster para fornecer recomendações pertinentes à sua carga de trabalho. Os índices recomendados são acompanhados por consultas de amostra, agrupadas por formato de consulta (ou seja, consultas com estrutura de predicado, classificação e projeção semelhantes), que foram executadas em uma collection que se beneficiaria com a adição de um índice sugerido. O Performance Advisor não afeta negativamente o desempenho do seu Atlas cluster. Se você estiver satisfeito com a recomendação, poderá implementar os novos índices automaticamente, sem incorrer em tempo de inatividade do aplicativo. Qual é o próximo Isso encerra esta última edição da série de práticas recomendadas de desempenho. A MongoDB University oferece um curso de treinamento gratuito baseado na Web sobre o desempenho do MongoDB . Esta é uma ótima maneira de aprender mais sobre o poder da indexação.

October 2, 2023

Next →

MongoDB Announces Leadership Transition

Dev Ittycheria, President and Chief Executive Officer, shared the following message with MongoDB employees this morning. This is the hardest email I have ever had to write to all of you. If you have not seen the announcement, I have decided to retire as CEO. Effective November 10, 2025, Chirantan “CJ” Desai will become the new CEO of MongoDB. This was not an easy decision for me. The process to get to this point has been deeply emotional, as I care profoundly about MongoDB and the people who have made the company what it is today. This news may come as a surprise, and for some, perhaps even a shock. That’s natural. Leadership transitions can evoke a range of reactions. I want to share why this is happening, and why it’s the right thing for MongoDB. Every personnel change, including the most senior leadership changes, involves two key decisions: first, recognizing that it is the right time for change, and second, selecting the best person to replace the person leaving. This email is intended to explain both decisions. Earlier this year, as part of our regular succession planning process, the Board and I discussed my long-term commitment. They asked if I would continue as CEO for another five years. After many conversations with my family and the Board, I realized I could not make that commitment. Some CEOs see their title as their identity. I do not. My core responsibility is to serve in the company's best interests. The company is primed for a new leader. One with a fresh perspective, grounded in experience and skills needed to guide MongoDB through its next evolution as a company, what we call MongoDB 3.0. Consequently, I informed the Board that I would commit to two more years to help find a successor. That began the search process for a suitable successor. To our surprise and delight, what we thought would easily take 12 to 24 months happened much faster than anyone expected. After engaging with multiple qualified candidates, we found the right successor in CJ. CJ is uniquely qualified for this role. CJ brings the rare growth-at-scale experience that will help continue to build MongoDB into an iconic technology company. At ServiceNow, he was the only executive to work directly with three of its highly regarded public company CEOs and played a pivotal role in organically scaling the company from just over $1 billion to more than $10 billion in revenue. Only a handful of independent software companies have ever reached that milestone. CJ helped transform ServiceNow from a product company to a platform company, scaled engineering, drove go-to-market excellence, and engaged deeply with investors. More recently, as President of Product and Engineering at Cloudflare, he helped fuel strong growth and stock performance. CJ also possesses the personal qualities needed to succeed as CEO. He is humble, eager to learn, and wants to draw on the perspectives of the people at MongoDB and other stakeholders to inform his thinking. This blend of experience, judgment, and character gives me full confidence that he is well-equipped to lead MongoDB through its next phase of growth. I often think of MongoDB’s journey as a long and extraordinary expedition. For the past eleven years, I have had the privilege of serving as its guide, helping chart the course, rally the team, and climb together through both calm and challenging terrain. Along the way, we have reached remarkable summits and proven what is possible through relentless innovation, persistence, and teamwork. Now it is time for a new guide to lead the next stage of the ascent and take MongoDB to even greater heights. CJ is the right leader to take MongoDB to the next summit. MongoDB is on a strong footing, with a clear strategy, an exceptional leadership team, a product platform that is more relevant than ever, and a business that is executing well. The rise of AI and the explosion of data-intensive applications play directly to MongoDB’s strengths. Our technology sits at the center of how modern applications are built and how organizations will harness data to power intelligent, adaptive systems. I am confident MongoDB is perfectly positioned to capture this next wave of innovation. As for me, I am not running away from MongoDB or leaving to join another company as CEO. I will remain on the Board and work closely with CJ to ensure a seamless transition. Over the years, this role has demanded an enormous amount of focus and energy; as a result, there are many things I’ve missed doing along the way. I’m looking forward to being more present for those moments — from simple time with my family to experiences and travel we’ve long put off. I plan to hold on to my MongoDB stock, as I firmly believe in the people and the opportunity, knowing that MongoDB’s best days are ahead of it. Yes, change can be unsettling. I’m sure you will have many questions about this change, such as why now, why CJ is the best person to lead the company, and what this means for you. We will hold an all-hands meeting tomorrow at 10:30AM ET to discuss this transition, introduce CJ and take your questions. That being said, I want to emphasize that the right change at the right time is how great companies get stronger. Just as a championship team refreshes its roster to stay competitive, MongoDB is bringing in new leadership, including other recent C-suite leaders who came before CJ, to drive our next phase of growth. This is not an ending; it’s the founding of a new moment. I am incredibly proud of what we have built together and genuinely excited about what lies ahead with CJ leading us forward. I also want to thank each of you for making this journey so meaningful. Words cannot fully capture my gratitude for your passion, creativity, and belief in building something truly special. I have often said that I want MongoDB to be an inflection point in people’s careers, a place where they can grow, take risks, and do the best work of their lives. I can say without hesitation that it has been exactly that for me. The skills I have developed, the experiences I have gained, and the relationships I have formed here have shaped me more than any other chapter in my professional life. I will carry them with me always, and will continue to cheer for and support MongoDB every step of the way. --Dev

November 3, 2025