Building AI with MongoDB: How Metaphor Data Uses Atlas Vector Search to Change the World Through Data

Elliott Gluck
October 10, 2023 | Updated: April 21, 2025

Illustration representing Atlas Vector Search

Since announcing MongoDB Atlas Vector Search in preview back in June, we’ve already seen rapid adoption from developers building a wide range of AI-enabled apps. Today we're highlighting another customer who has increased efficiency while removing architectural complexity by adopting Atlas Vector Search.

Metaphor is a search and discovery tool built for data scientists, data engineers, and AI practitioners. The company’s mission is to empower individuals and companies of all types to change the world through data. Metaphor is the next evolution of the Data Catalog with fully automated support for Data Governance, Data Literacy, and Data Enablement using an intuitive user interface.

We recently caught up with Mars Lan, Co-founder and CTO to learn more about the company’s journey with MongoDB and their adoption of Atlas Vector Search.

Check out our AI Learning Hub to learn more about building AI-powered apps with MongoDB.

Tell us a little bit about your company and what you and the team are building

We’re an early-stage startup with a mission to empower individuals and organizations to change the world through data. We refer to ourselves as the social platform for data and have a range of products that support both data teams but also data consumers. Our main product is a SaaS Data Catalog that enables governance and enablement of data across the organization. We’re a small team of around 15 or so with a keen focus on product and engineering. The company was founded about 2.5 years ago.

What role does search play at the company and where did your search story begin?

Well, I will start by saying that we almost ended up having a very different story to tell you than what actually ended up transpiring! We started off our journey using DocumentDB and Elasticsearch on AWS for our database and search needs. After some time we ran into some scalability issues that caused us to evaluate (and eventually move to) MongoDB Atlas for our database needs. When we saw MongoDB offered Atlas Search which was based on the same underlying Lucene technology we got very excited and began the process of migrating our search efforts over to Atlas — and this eventually laid the groundwork for adopting Atlas Vector Search later on.

So starting with those initial search needs, what got you excited about Atlas Search with MongoDB? What were your use cases?

We started to face a significant amount of maintenance and upkeep associated between our database and Elasticsearch. We previously had to build data pipelines, so if something changed in the database, it would also change in search. Once we eventually migrated everything to MongoDB Atlas Search, we no longer had to manage those pipelines. This resulted in lower latency and less likelihood of bugs, which excited our team.

The other component to this was the scalability disconnect of having two different systems. We realized if we ever needed to spin up more storage or compute, we could just spin up a larger MongoDB cluster and get that extra scalability right away with the Atlas platform. Of course one less thing to worry about is also a huge benefit — Elasticsearch is not the easiest thing to manage, so having it all in MongoDB was another big plus for us.

How did you initially learn about Atlas Vector Search and what piqued your interest?

We started experimenting with Pinecone as the AI stuff really started to explode a while back, just to try out the tool, as one of our interns had initially started playing around with it. It turns out not to be cost-effective to spin up a Pinecone instance for each customer, and quite difficult to scale up due to API throttling.

After some time, we started looking around for other vendors for vector search. However, once we learned that MongoDB had Vector Search we got excited at the prospect of being able to use our existing tech stack for this additional functionality. It quickly became a no-brainer to us — since we knew we were going to move everything to Atlas, it became obvious we should just consolidate everything there, so we ended up migrating to Atlas Vector Search for all of our semantic search needs. This means one query API, one set of dependencies, and build in sync, all in a single platform.

What were the key factors that made you pull the trigger and adopt Atlas Vector Search? What were the problems you were trying to solve?

So one key unlock for us was the semantic search side of things, where someone can ask a natural language question and get a natural language answer. This is a much more preferable user experience for us compared to your Google-style keyword searches.

From day one we always wanted to best serve our core customer the engineer, but another huge constituency for us is the business or non-technical audience. These folks prefer a tool that is more intuitive to use.

To best serve them we have a first-class integration into Slack and Microsoft Teams, so they can ask a question and don't have to go to another place or switch tools to get that answer. We didn’t always have the capability to do the natural language question and response, but with Atlas Vector Search this now becomes possible. Using Vector Search we now have the ability to ask the Slack bot questions like “where can I find this type of data” or “where is this one table on revenue from last quarter and who is using it” and get a natural language response back.

One of the key considerations for us when looking at vendors was cost - but not just cost in terms of what shows up on an invoice. I would rather scale one system and get benefits on both (search and vector search). We saw that having to scale two systems independently was just not going to be very efficient in the long run.

Can you talk about some of the initial benefits you’ve seen so far both on the Atlas Search side as well as with Vector Search specifically? How do you think about and quantify these benefits?

Well one obvious thing that stands out on the search side is increased speed and being able to move quickly. MongoDB in general has a great developer experience. Our data model tends to be highly complex documents, and all the metadata tends to be highly structured and complex, so the MongoDB model fits us very well.

In terms of productivity, it’s never an exact science. I will say that with the adoption of Atlas we were able to keep our engineering team size relatively constant while serving many more customers and scale our development efforts faster — so we probably saw a 2X - 3X increase in productivity.

One last item of note. We adopt the most rigorous security practices because we deal with so much customer data, so we want to ensure the highest security possible. We chose to have dedicated MongoDB clusters per customer, so every customer’s data is totally isolated from each other. When we were on Pinecone, this meant spinning up a new Pinecone pod for each customer, which would be both really hard to do and not at all financially viable. Because we are centralizing this all under MongoDB, it becomes so much easier - you can dynamically scale your cluster sizes up and down depending on the needs or requirements of small vs. large customers. There’s not the sort of waste you’d get with multiple discrete systems.

Getting started

A big thank you to Mars and the entire Metaphor Data team for sharing more about their story and use of Atlas Vector Search.

Want to learn more? Head over to our quick-start guide to get started with Atlas Vector Search today. And if you’re a startup building with AI please check out our MongoDB AI Innovators program for Atlas credits, one-on-one technical advice, access to our partner network, and more!

← Previous

4 grandes motivos para atualizar para o MongoDB 7.0

Ultimamente, temos pegado a estrada e feito notícia em uma série de eventos nas principais cidades do mundo. Um dos grandes destaques é o lançamento do MongoDB 7.0 , que oferece um conjunto abrangente de recursos projetados para agilizar as operações, melhorar o desempenho e aumentar a segurança. Com este lançamento, o MongoDB reafirma-se como a melhor escolha para organizações que buscam aumentar a produtividade de suas equipes de desenvolvimento à medida que constroem aplicações modernas e distribuídas. A versão 7.0 possui todos os recursos lançados nas versões anteriores, com recursos adicionais destinados a facilitar a construção de software pelos desenvolvedores. #1 - Desempenho aprimorado O MongoDB 7.0 traz melhorias significativas para trabalhar com dados Time Series , especialmente conjuntos de dados exigentes e de alto volume de todos os formatos. Essas melhorias resultam em melhor otimização e compactação de armazenamento, bem como melhor desempenho de consulta. Os desenvolvedores experimentarão um manuseio ainda melhor de dados de alta cardinalidade, melhor escalabilidade e desempenho geral; permitindo que você managed dados Time Series de maneira mais eficiente e econômica. Change streams agora oferecerá suporte a casos de uso ainda mais amplos: lidar com alterações em documentos grandes, mesmo com pré-imagens e pós-imagens, sem causar erros inesperados. #2 - Migrações mais suaves As atualizações na Cluster-to-cluster Sync (mongosync) permitirão uma migração de dados mais eficiente em diversos cenários. Cluster-to-Cluster Sync (mongosync) agora oferece maior flexibilidade na sincronização entre clusters com topologias diferentes, como conjuntos de réplicas a clusters fragmentados. A sincronização filtrada permite sincronizar conjuntos de dados específicos em vez de todo o cluster. Atlas Live Migrate agora oferece suporte a migrações para clusters que executam MongoDB 6.0.4+ entregando migrações mais rápidas e resilientes em casos de interrupção durante o processo de migração. #3 - Experiência simplificada do desenvolvedor Com novos aprimoramentos no aggregation pipeline — incluindo compound wildcard indexes , percentis aproximados e operadores bit a bit — os desenvolvedores podem desfrutar de maior flexibilidade e desempenho na indexação e consulta de dados. Com o MongoDB 7.0, os desenvolvedores também podem implementar variáveis de função de usuário no aggregation pipeline , permitindo que uma única visualização exiba dados diferentes com base nas permissões dos usuários logados. Suporte para atualizações e Time Series collection exclusões refinadas na e novas métricas para ajudar a selecionar uma chave de fragmento ajudam a reduzir o esforço do desenvolvedor e agilizar o processo de desenvolvimento. #4 - Controles de segurança mais fortes O MongoDB 7.0 fortalece os recursos de segurança com Queryable Encryption para ajudar os clientes a criptografar dados confidenciais e executar consultas de igualdade em dados criptografados totalmente aleatórios. As melhorias de segurança garantem que os desenvolvedores possam criar e implantar aplicativos com confiança, sabendo que seus dados estão protegidos e em conformidade com os padrões e protocolos de segurança mais recentes. Porque esperar? Com uma série de novos recursos e melhorias projetados para tornar sua equipe mais produtiva, o MongoDB 7.0 é a escolha perfeita para organizações que buscam levar seu desenvolvimento para o próximo nível. Desde desempenho aprimorado até recursos de segurança mais robustos, o MongoDB 7.0 facilita a construção do próximo grande sucesso. Registre-se no Atlas agora e comece a construir hoje . Se desejar orientação sobre como atualizar para a versão 7.0, nossa equipe de serviços profissionais oferece suporte de atualização para ajudar a garantir uma transição tranquila. Para saber mais, consulte Consultoria MongoDB .

October 5, 2023

Next →

That’s a Wrap: MongoDB’s 2025 in Review & 2026 Predictions

It’s nearly the end of the year—again! That means it’s time for an end-of-year blog post that expresses disbelief at the passage of time. Which, as the saying goes, flies when you’re having fun. And definitely when you’re as busy as MongoDB was in 2025. It was a big year for the company—and more importantly, for the tens of thousands of customers and millions of developers who rely on MongoDB’s modern data platform for their most mission-critical workloads. At MongoDB, everything we do starts with our obsession with customers and their needs, and if there’s a theme to MongoDB’s 2025, it was (and will continue to be) enabling customer innovation and helping them succeed in the AI era. So here are a few highlights of how MongoDB acted on behalf of customers in 2025. From the acquisition of Voyage AI to customer success across industries, a lot happened in 2025. Let’s go!* *Read to the end for 2026 thoughts. 2025: The (MongoDB) year that was Voyage AI, modernization, and search In February, MongoDB announced the acquisition of Voyage AI, a pioneer in embedding and reranking models, to enhance the accuracy of AI applications. Integrating Voyage AI's advanced retrieval technology with MongoDB’s modern, AI-ready data platform addresses a critical challenge: LLM model hallucinations caused by a lack of context. By improving retrieval accuracy for specialized domains like finance and law, the integration enables businesses to deploy AI for mission-critical use cases. To learn more, see the MongoDB Voyage AI page. Then, in September, we launched MongoDB AMP, an AI-powered Application Modernization Platform. AMP is designed to accelerate the transformation of legacy applications through a combination of AI-powered tooling, a proven delivery framework, and expert guidance (tools, techniques, and talent) to help enterprises reduce technical debt and modernize 2-3 times faster. Want more? Sure you do! Check out this short video. MongoDB also announced the addition of search and vector search capabilities to MongoDB Community Edition and MongoDB Enterprise Server. This allows developers to build and test AI-native applications, including those using retrieval-augmented generation (RAG), in local or on-premises environments. Previously exclusive to MongoDB Atlas, these features enable secure, hybrid deployments where sensitive data can remain on-premises while still leveraging advanced search tools. Here’s a (slightly less short) video about search and vector search on Enterprise Server. Growing and scaling with MongoDB As noted, everything we do at MongoDB starts with our obsession with customers. 2025 was another banner year for customer success and innovation—we were inspired by what organizations of every shape and size, across industries and geographies, built with MongoDB in 2025. Here are just two of the many stories our customers shared in 2025; much more can be found in my colleague Katie Palmer’s blog series, Innovating with MongoDB. Factory By combining the Atlas modern data platform with Voyage AI’s high-performance embeddings, the AI-native startup Factory—which uses AI agents called Droids to accelerate software development lifecycles for organizations—consolidated its fragmented tech stack. This enabled superior code retrieval, simplified operations, and provided the scalability needed to process billions of tokens daily. McKesson McKesson, a global pharmaceutical distributor, replaced its monolithic legacy infrastructure with MongoDB Atlas to meet strict drug tracing mandates. By adopting our modern cloud data platform, McKesson scaled its operations 300x, managing tracking data for 1.2 billion containers annually without latency, and ensuring compliance and patient safety while reducing developer complexity. For more, check out the video of McKesson at MongoDB.local NYC from September. From niche NoSQL to enterprise powerhouse As senior MongoDB engineer and Technical Fellow Ashish Kumar put it earlier this year, “through a sustained and deliberate engineering effort,” MongoDB has gone from a (seemingly) niche NoSQL solution to a trusted enterprise standard, and now delivers “the high availability, tunable consistency, ACID transactions, and robust security that enterprises demand.” A new era of leadership The face of MongoDB has also changed—our CFO, Mike Berry, joined the company in April, and Dev Ittycheria stepped down as CEO in November, after more than 11 years leading the company (including its 2017 IPO). In a LinkedIn post about his role, new MongoDB CEO CJ Desai noted that the company is “at the forefront of a new data revolution, unlocking the next wave of productivity and intelligence.” “Having spent my career building and scaling technology platforms, I’ve always been drawn to companies defined by clarity of vision, relentless organic innovation, and a customer-first culture. MongoDB exemplifies all three,” said Desai. We couldn’t agree more. Onward! Reading the 2026 tea leaves So what might 2026 bring (for MongoDB and tech at large)? Here are a handful of our leaders’ predictions: “As much as people want to talk about Artificial General Intelligence (AGI), we’re still in the phase where most AI use cases automate redundant tasks but benefit from human-in-the-loop checks. Organizations that use AI to complete work that historically is a drain on human resources—but then uses people to carefully verify what AI builds, apply governance frameworks, and maintain accountability across the data lifecycle—will be more successful.” —Pete Johnson, Field CTO, AI, MongoDB “After years of inflated expectations and unsustainable spending, the AI industry is trapped in a bubble where companies reflexively attempt to deploy LLMs at every problem, driving up costs with minimal to no return. Businesses that break free from this spending cycle are the ones that understand the need to ground LLM responses in factual data and learn from prior mistakes. We believe the best way to do this will be with highly accurate embedding models and rerankers for reliable data retrieval.” —Frank Liu, Staff Product Manager, MongoDB "In 2026, cloud independence will evolve from strategic preference to existential imperative across enterprises of every scale. The outages and disruptions of recent years have exposed a fundamental truth: in an always-on digital economy—where commerce, mobility, governance, and even public safety depend on uninterrupted access to cloud services—single-provider reliance is no longer a calculated risk, but a systemic vulnerability. Compounding this is the inexorable rise of data sovereignty. Regulatory regimes worldwide now demand precise jurisdictional control over data residency, rendering rigid cloud commitments incompatible with compliance at global scale. The defining competitive advantage will belong to organizations that transcend fragile prevention theater and engineer true infrastructural resilience: architectures inherently portable, data frictionlessly mobile, and operations autonomously sustained across heterogeneous clouds through AI-orchestrated redundancy. In short, the winners will not merely mitigate downtime—they will design systems that render the concept obsolete." —Ben Cefalo, SVP, Head of Core Products, MongoDB Happy holidays and happy New Year, everyone!

December 22, 2025