Build AI Agents Worth Keeping: The Canvas Framework

Mikiko Bazeley
September 23, 2025

Why 95% of enterprise AI agent projects fail

Development teams across enterprises are stuck in the same cycle: They start with "Let's try LangChain" before figuring out what agent to build. They explore CrewAI without defining the use case. They implement RAG before identifying what knowledge the agent actually needs. Months later, they have an impressive technical demo showcasing multi-agent orchestration and tool calling—but can't articulate ROI or explain how it solves actual business needs.

According to McKinsey's latest research, while nearly eight in 10 companies report using generative AI, fewer than 10% of use cases deployed ever make it past the pilot stage. MIT researchers studying this challenge identified a "gen AI divide"—a gap between organizations successfully deploying AI and those stuck in perpetual pilots. In their sample of 52 organizations, researchers found patterns suggesting failure rates as high as 95% (pg.3). Whether the true failure rate is 50% or 95%, the pattern is clear: Organizations lack clear starting points, initiatives stall after pilot phases, and most custom enterprise tools fail to reach production.

6 critical failures killing your AI agent projects

The gap between agentic AI's promise and its reality is stark. Understanding these failure patterns is the first step toward building systems that actually work.

1. The technology-first trap

MIT's research found that while 60% of organizations evaluated enterprise AI tools, only 5% reached production (pg.6)—a clear sign that businesses struggle to move from exploration to execution. Teams rush to implement frameworks before defining business problems. While most organizations have moved beyond ad hoc approaches (down from 19% to 6%, according to IBM), they've replaced chaos with structured complexity that still misses the mark.

Meanwhile, one in four companies taking a true "AI-first" approach—starting with business problems rather than technical capabilities—report transformative results. The difference has less to do with technical sophistication and more about strategic clarity.

2. The capability reality gap

Carnegie Mellon's TheAgentCompany benchmark exposed the uncomfortable truth: Even our best AI agents would make terrible employees. The best AI model (Claude 3.5 Sonnet) completes only 24% of office tasks, with 34.4% success when given partial credit. Agents struggle with basic obstacles, such as pop-up windows, which humans navigate instinctively.

More concerning, when faced with challenges, some agents resort to deception, like renaming existing users instead of admitting they can't find the right person. These issues demonstrate fundamental reasoning gaps that make autonomous deployment dangerous in real business environments, rather than just technical limitations.

3. Leadership vacuum

The disconnect is glaring: Fewer than 30% of companies report CEO sponsorship of the AI agenda despite 70% of executives saying agentic AI is important to their future. This leadership vacuum creates cascading failures—AI initiatives fragment into departmental experiments, lack authority to drive organizational change, and can't break through silos to access necessary resources.

Contrast this with Moderna, where CEO buy-in drove the deployment of 750+ AI agents and radical restructuring of HR and IT departments. As with the early waves of Big Data, data science, then machine learning adoption, leadership buy-in is the deciding factor for the survival of generative AI initiatives.

4. Security and governance barriers

Organizations are paralyzed by a governance paradox: 92% believe governance is essential, but only 44% have policies (SailPoint, 2025). The result is predictable—80% experienced AI acting outside intended boundaries, with top concerns including privileged data access (60%), unintended actions (58%), and sharing privileged data (57%). Without clear ethical guidelines, audit trails, and compliance frameworks, even successful pilots can't move to production.

5. Infrastructure chaos

The infrastructure gap creates a domino effect of failures. While 82% of organizations already use AI agents, 49% cite data concerns as primary adoption barriers (IBM). Data remains fragmented across systems, making it impossible to provide agents with complete context.

Teams end up managing multiple databases—one for operational data, another for vector data and workloads, a third for conversation memory—each with different APIs and scaling characteristics. This complexity kills momentum before agents can actually prove value.

6. The ROI mirage

The optimism-reality gap is staggering. Nearly 80% of companies report no material earnings impact from gen AI (McKinsey), while 62% expect 100%+ ROI from deployment (PagerDuty). Companies measure activity (number of agents deployed) rather than outcomes (business value created). Without clear success metrics defined upfront, even successful implementations look like expensive experiments.

The AI development paradigm shift: from data-first to product-first

There's been a fundamental shift in how successful teams approach agentic AI development, and it mirrors what Shawn Wang (Swyx) observed in his influential "Rise of the AI Engineer" post about the broader generative AI space.

The old way: data → model → product

In the traditional paradigm practiced during the early years of machine learning, teams would spend months architecting datasets, labeling training data, and preparing for model pre-training. Only after training custom models from scratch could they finally incorporate these into product features.

The trade-offs were severe: massive upfront investment, long development cycles, high computational costs, and brittle models with narrow capabilities. This sequential process created high barriers to entry—only organizations with substantial ML expertise and resources could deploy AI features.

Figure 1. The Data → Model → Product Lifecycle.

This diagram has the title traditional flow: pre-foundation model era. On the left center of this diagram is a box for data, which takes 1-6 months and is for architect datasets, clean & label, and prepare training. This box connects to a box for model through a line titled heavy investment. The model box takes 7-9 months and is for train from scratch, high compute costs, and narrow capabilities. This box then connects to a box titled product through a line titled single purpose. The product box takes 10+ months and has the descriptors of finally integrate, limited features, and brittle deployment. At the bottom is a box that lists the challenges of this approach, such as 10+ months to first value & signal, massive upfront investment, brittle, single-purpose models, and long cycles, limited value, high infrastructure requirements. — Traditional AI development required months of data preparation and model training before shipping products.

The new way: product → data → model

The emergence of foundation models changed everything.

Figure 2. The Product → Data → Model Lifecycle.

The title of this diagram is modern flow: foundation model era. This model begins on the left with product which is associated with week 1 and is for defining user need, building an MVP, and fast iteration. This then leads to data via fast experimentation. Data is associated with week 2 and is for identifying needed knowledge, collecting examples, and structuring for retrieval. Data then connects to model via immediate capability. Model occurs over week 3+ and is for selecting providers, optimizing prompts, and testing performance. The box at the bottom lists the benefits of this approach, which includes days to first value & signal, easy model swapping, data requirements drive model choice, and product hypotheses can be tested with near immediate feedback. — Foundation model APIs flipped the traditional cycle, enabling rapid experimentation before data and model optimization.

Powerful LLMs became commoditized through providers like OpenAI and Anthropic. Now, teams could:

Start with the product vision and customer need.

Identify what data would enhance it (examples, knowledge bases, RAG content).

Select the appropriate model that could process that data effectively.

This enabled zero-shot and few-shot capabilities via simple API calls. Teams could build MVPs in days, define their data requirements based on actual use cases, then select and swap models based on performance needs. Developers now ship experiments quickly, gather insights to improve data (for RAG and evaluation), then fine-tune only when necessary. This democratized cutting-edge AI to all developers, not just those with specialized ML backgrounds.

The agentic evolution: product → agent → data → model

But for agentic systems, there's an even more important insight: Agent design sits between product and data.

Figure 3. The Product → Agent → Data → Model Lifecycle.

This diagram is titled agentic flow: foundation model era. This diagram begins on the left with product where you define the problem. This connects to agent via user-first design, and the agent is for design behavior. Agent then goes to data via determines requirements, and data is for enhancing performance. Data connects to model via match to agent needs, and the model step is for select provider. The new considerations of this are that the agent layer orchestrates everything, tools & workflows before model selection, and data enhances, doesn't enable. — Agent design now sits between product and data, determining downstream requirements for knowledge, tools, and model selection.

Now, teams follow this progression:

Product: Define the user problem and success metrics.

Agent: Design agent capabilities, workflows, and behaviors.

Data: Determine what knowledge, examples, and context the agent needs.

Model: Select external providers and optimize prompts for your data.

With external model providers, the "model" phase is really about selection and integration rather than deployment. Teams choose which provider's models best handle their data and use case, then build the orchestration layer to manage API calls, handle failures, and optimize costs.

The agent layer shapes everything downstream—determining what data is needed (knowledge bases, examples, feedback loops), what tools are required (search, calculation, code execution), and ultimately, which external models can execute the design effectively.

This evolution means teams can start with a clear user problem, design an agent to solve it, identify necessary data, and then select appropriate models—rather than starting with data and hoping to find a use case. This is why the canvas framework follows this exact flow.

The canvas framework: A systematic approach to building AI agents

Rather than jumping straight into technical implementation, successful teams use structured planning frameworks. Think of them as "business model canvases for AI agents"—tools that help teams think through critical decisions in the right order.

Two complementary frameworks directly address the common failure patterns:

Figure 4. The Agentic AI Canvas Framework.

This diagram is titled agent AI canvas framework: From idea to production. This process goes from business problem, to POC Canvas, to prototype & launch, then to production canvas, and finally production agent. — A structured five-phase approach moving from business problem definition through POC, prototype, production canvas, and production agent deployment. Please see the “Resources” section at the end for links to the corresponding templates, hosted in the gen AI Showcase.

Canvas #1 - The POC canvas for validating your agent idea

The POC canvas implements the product → agent → data → model flow through eight focused squares designed for rapid validation:

Figure 5. The Agent POC Canvas V1.

This table is titled agent POC: Canvas 1. The description at the top of the table says the canvas helps teams systematically work through all aspects of an agentic AI project while avoiding redundancy and ensuring nothing critical is missed. — Eight focused squares implementing the product → agent → data → model flow for rapid validation of AI agent concepts.

Phase 1: Product validation—who needs this and why?

Before building anything, you must validate that a real problem exists and that users actually want an AI agent solution. This phase prevents the common mistake of building impressive technology that nobody needs. If you can't clearly articulate who will use this and why they'll prefer it to current methods, stop here.

Square	Purpose	Key Questions
Product vision & user problem	Define the business problem and establish why an agent is the right solution.	Core problem: What specific workflow frustrates users today? Target users: Who experiences this pain and how often? Success vision: What would success look like for users? Value hypothesis: Why would users prefer an agent to current solutions?
User validation & interaction	User Validation & Interaction Map how users will engage with the agent and identify adoption barriers.	User journey: What's the complete interaction from start to finish? Interface preference: How do users want to interact? Feedback mechanisms: How will you know it's working? Adoption barriers: What might prevent users from trying it?

Phase 2: Agent design—what will it do and how?

With a validated problem, design the agent's capabilities and behavior to solve that specific need. This phase defines the agent's boundaries, decision-making logic, and interaction style before any technical implementation. The agent design directly determines what data and models you'll need, making this the critical bridge between problem and solution.

Square	Purpose	Key Questions
Agent capabilities & workflow	Agent Capabilities & Workflow Design what the agent must do to solve the identified problem.	Core tasks: What specific actions must the agent perform? Decision logic: How should complex requests be broken down? Tool requirements: What capabilities does the agent need? Autonomy boundaries: What can it decide versus escalate?
Agent interaction & memory	Agent Interaction & Memory Establish communication style and context management.	Conversation flow: How should the agent guide interactions? Personality and tone: What style fits the use case? Memory requirements: What context must persist? Error handling: How should confusion be managed?

Phase 3: Data requirements—what knowledge does it need?

Agents are only as good as their knowledge base, so identify exactly what information the agent needs to complete its tasks. This phase maps existing data sources and gaps before selecting models, ensuring you don't choose technology that can't handle your data reality. Understanding data requirements upfront prevents the costly mistake of selecting models that can't work with your actual information.

Square	Purpose	Key Questions
Knowledge requirements & sources	Identify essential information and where to find it.	Essential knowledge: What information must the agent have to complete tasks? Data sources: Where does this knowledge currently exist? Update frequency: How often does this information change? Quality requirements: What accuracy level is needed?
Data collection & enhancement strategy	Plan data gathering and continuous improvement.	Collection strategy: How will initial data be gathered? Enhancement priority: What data has the biggest impact? Feedback loops: How will interactions improve the data? Integration method: How will data be ingested and updated?

Phase 4: External model integration—which provider and how?

Only after defining data needs should you select external model providers and build the integration layer. This phase tests whether available models can handle your specific data and use case while staying within budget. The focus is on prompt engineering and API orchestration rather than model deployment, reflecting how modern AI agents actually get built.

Square	Purpose	Key Questions
Provider selection & prompt engineering	Choose external models and optimize for your use case.	Provider evaluation: Which models handle your requirements best? Prompt strategy: How should you structure requests for optimal results? Context management: How should you work within token limits? Cost validation: Is this economically viable at scale?
API integration & validation	Build orchestration and validate performance.	Integration architecture: How do you connect to providers? Response processing: How do you handle outputs? Performance testing: Does it meet requirements? Production readiness: What needs hardening?

Figure 6. The Agent POC Canvas V1 (Detailed).

Table diagram titled Agent POC: Canvas V1 - detailed. The description for the table says the canvas helps teams systematically work through all aspects of an agentic AI project while avoiding redundancy and ensuring nothing critical is missed. — Expanded view with specific guidance for each of the eight squares covering product validation, agent design, data requirements, and external model integration.

Unified data architecture: solving the infrastructure chaos

Remember the infrastructure problem—teams managing three separate databases with different APIs and scaling characteristics? This is where a unified data platform becomes critical.

Agents need three types of data storage:

Application database: For business data, user profiles, and transaction history

Vector store: For semantic search, knowledge retrieval, and RAG

Memory store: For agent context, conversation history, and learned behaviors

Instead of juggling multiple systems, teams can use a unified platform like MongoDB Atlas that provides all three capabilities—flexible document storage for application data, native vector search for semantic retrieval, and rich querying for memory management—all in a single platform.

This unified approach means teams can focus on prompt engineering and orchestration rather than model infrastructure, while maintaining the flexibility to evolve their data model as requirements become clearer. The data platform handles the complexity while you optimize how external models interact with your knowledge.

For embeddings and search relevance, specialized models like Voyage AI can provide domain-specific understanding, particularly for technical documentation where general-purpose embeddings fall short. The combination of unified data architecture with specialized embedding models addresses the infrastructure chaos that kills projects.

This unified approach means teams can focus on agent logic rather than database management, while maintaining the flexibility to evolve their data model as requirements become clearer.

Canvas #2 - The production canvas for scaling your validated AI agent

When a POC succeeds, the production canvas guides the transition from "it works" to "it works at scale" through 11 squares organized following the same product → agent → data → model flow, with additional operational concerns:

Figure 7. The Productionize Agent Canvas V1.

Table diagram titled productionize agent: Canvas V1. The description is this canvas guides enterprise teams through the complete journey from validated POC to production-ready agentic systems, addressing technical architecture, business requirements, and operational excellence. — Eleven squares guiding the transition from validated POC to production-ready systems, addressing scale, architecture, operations, and governance.

Phase 1: Product and scale planning

Transform POC learnings into concrete business metrics and scale requirements for production deployment. This phase establishes the economic case for investment and defines what success looks like at scale. Without clear KPIs and growth projections, production systems become expensive experiments rather than business assets.

Square	Purpose	Key Questions
Business case & scale planning	Translate POC validation into production metrics.	Proven value: What did the POC validate? Business KPIs: What metrics measure ongoing success? Scale requirements: How many users and interactions? Growth strategy: How will usage expand over time?
Production requirements & constraints	Define performance standards and operational boundaries.	Performance standards: Response time, availability, throughput? Reliability requirements: Recovery time and failover? Budget constraints: Cost limits and optimization targets? Security needs: Compliance and data protection requirements?

Phase 2: Agent architecture

Design robust systems that handle complex workflows, multiple agents, and inevitable failures without disrupting users. This phase addresses the orchestration and fault tolerance that POCs ignore but production demands. The architecture decisions here determine whether your agent can scale from 10 users to 10,000 without breaking.

Square	Purpose	Key Questions
Robust agent architecture	Design for complex workflows and fault tolerance.	Workflow orchestration: How do you manage multi-step processes? Multi-agent coordination: How do specialized agents collaborate? Fault tolerance: How do you handle failures gracefully? Update rollouts: How do you update without disruption?
Production memory & context systems	Implement scalable context management.	Memory architecture: Session, long-term, and organizational knowledge? Context persistence: Storage and retrieval strategies? Cross-session continuity: How do you maintain user context? Memory lifecycle management: Retention, archival, and cleanup?

Phase 3: Data infrastructure

Build the data foundation that unifies application data, vector storage, and agent memory in a manageable platform. This phase solves the "three database problem" that kills production deployments through complexity. A unified data architecture reduces operational overhead while enabling the sophisticated retrieval and context management that production agents require.

Square	Purpose	Key Questions
Data architecture & management	Build a unified platform for all data types.	Platform architecture: Application, vector, and memory data? Data pipelines: Ingestion, processing, and updates? Quality assurance: Validation and freshness monitoring? Knowledge governance: Version control and approval workflows?
Knowledge base & pipeline operations	Maintain and optimize knowledge systems.	Update strategy: How does knowledge evolve? Embedding approach: Which models for which content? Retrieval optimization: Search relevance and reranking? Operational monitoring: Pipeline health and costs?

Phase 4: Model operations

Implement strategies for managing multiple model providers, fine-tuning, and cost optimization at production scale. This phase covers API management, performance monitoring, and the continuous improvement pipeline for model performance. The focus is on orchestrating external models efficiently rather than deploying your own, including when and how to fine-tune.

Square	Purpose	Key Questions
Model strategy & optimization	Manage providers and fine-tuning strategies.	Provider selection: Which models for which tasks? Fine-tuning approach: When and how to customize? Routing logic: Base versus fine-tuned model decisions? Cost controls: Caching and intelligent routing?
API management & monitoring	Handle external APIs and performance tracking.	API configuration: Key management and failover? Performance Tracking: Accuracy, latency, and costs? Fine-tuning pipeline: Data collection for improvement? Version control: A/B testing and rollback strategies?

Phase 5: Hardening and operations

Add the security, compliance, user experience, and governance layers that transform a working system into an enterprise-grade solution. This phase addresses the non-functional requirements that POCs skip but enterprises demand. Without proper hardening, even the best agents remain stuck in pilot purgatory due to security or compliance concerns.

Square	Purpose	Key Questions
Security & compliance	Implement enterprise security and regulatory controls.	Security implementation: Authentication, encryption, and access management? Access control: User and system access management? Compliance framework: Which regulations apply? Audit capabilities: Logging and retention requirements?
User experience & adoption	Drive usage and gather feedback.	Workflow integration: How do you fit existing processes? Adoption strategy: Rollout and engagement plans? Support systems: Documentation and help channels? Feedback integration: How does user input drive improvement?
Continuous improvement & governance	Ensure long-term sustainability.	Operational procedures: Maintenance and release cycles? Quality gates: Testing and deployment standards? Cost management: Budget monitoring and optimization? Continuity planning: Documentation and team training?

Figure 8. The Productionize Agent Canvas V1 (Detailed).

Table diagram titled productionize agent: Canvas V1 - Detailed. The description is this canvas guides enterprise teams through the complete journey from validated POC to production-ready agentic systems, addressing technical architecture, business requirements, and operational excellence. — Expanded view with specific guidance for each of the eleven squares covering scale planning, architecture, data infrastructure, model operations, and hardening requirements.

Next steps: start building AI agents that deliver ROI

MIT's research found that 66% of executives want systems that learn from feedback, while 63% demand context retention (pg.14). The dividing line between AI and human preference is memory, adaptability, and learning capability.

The canvas framework directly addresses the failure patterns plaguing most projects by forcing teams to answer critical questions in the right order—following the product → agent → data → model flow that successful teams have discovered.

For your next agentic AI initiative:

Start with the POC canvas to validate concepts quickly.

Focus on user problems before technical solutions.

Leverage AI tools to rapidly prototype after completing your canvas.

Only scale what users actually want with the production canvas.

Choose a unified data architecture to reduce complexity from day one.

Remember: The goal isn't to build the most sophisticated agent possible—it's to build agents that solve real problems for real users in production environments.

For hands-on guidance on memory management, check out our webinar on YouTube, which covers essential concepts and proven techniques for building memory-augmented agents.

Head over to the MongoDB AI Learning Hub to learn how to build and deploy AI applications with MongoDB.

Resources

Download POC Canvas Template (PDF)

Download Production Canvas Template (PDF)

Download Combined POC + Production Canvas (Excel) - Get both canvases in a single excel file, with example prompts and blank templates.

Full reference list

McKinsey & Company. (2025). "Seizing the agentic AI advantage." ttps://www.mckinsey.com/capabilities/quantumblack/our-insights/seizing-the-agentic-ai-advantage

MIT NANDA. (2025). "The GenAI Divide: State of AI in Business 2025." Report

Gartner. (2025). "Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027." https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027

IBM. (2025). "IBM Study: Businesses View AI Agents as Essential, Not Just Experimental." https://newsroom.ibm.com/2025-06-10-IBM-Study-Businesses-View-AI-Agents-as-Essential,-Not-Just-Experimental

Carnegie Mellon University. (2025). "TheAgentCompany: Benchmarking LLM Agents." https://www.cs.cmu.edu/news/2025/agent-company

Swyx. (2023). "The Rise of the AI Engineer." Latent Space. https://www.latent.space/p/ai-engineer

SailPoint. (2025). "SailPoint research highlights rapid AI agent adoption, driving urgent need for evolved security." https://www.sailpoint.com/press-releases/sailpoint-ai-agent-adoption-report

SS&C Blue Prism. (2025). "Generative AI Statistics 2025." https://www.blueprism.com/resources/blog/generative-ai-statistics-2025/

PagerDuty. (2025). "State of Digital Operations Report." https://www.pagerduty.com/newsroom/2025-state-of-digital-operations-study/

Wall Street Journal. (2024). "How Moderna Is Using AI to Reinvent Itself." https://www.wsj.com/articles/at-moderna-openais-gpts-are-changing-almost-everything-6ff4c4a5

← Previous

Simplify AI-Driven Data Connectivity With MongoDB and MCP Toolbox

The wave of generative AI applications is revolutionizing how businesses interact with and derive value from their data. Organizations need solutions that simplify these interactions and ensure compatibility with an expanding ecosystem of databases. Enter MCP Toolbox for Databases , an open-source Model Context Protocol (MCP) server that enables seamless integration between gen AI agents and enterprise data sources using a standardized protocol pioneered by Anthropic. With the built-in capability to query multiple data sources simultaneously and unify results, MCP Toolbox eliminates fragmented integration challenges, empowering businesses to unlock the full potential of their data. With MongoDB Atlas now joining the ecosystem of databases supported by MCP Toolbox, enterprises using MongoDB’s industry-leading cloud-native database platform can benefit from streamlined connections to their gen AI systems. As businesses adopt gen AI to unlock insights and automate workflows, the choice of database is critical to meeting demands for dynamic data structures, scalability, and high-performance applications. MongoDB Atlas, with its fully managed, document-oriented NoSQL design and capabilities for flexible schema modeling, is the ultimate companion to MCP Toolbox for applications requiring unstructured or semistructured data connectivity. This blog post explores how MongoDB Atlas integrates into MCP Toolbox, its advantages for developers, and the key use cases for enabling AI-driven data solutions in enterprise environments. Figure 1. MongoDB as a source for MCP Toolbox for Databases. How it works The integration of MongoDB Atlas with MCP Toolbox enables users to perform Create, Read, Update, Delete (CRUD) operations on MongoDB data sources using the standardized MCP. Beyond fundamental data management tasks, this integration also unlocks capabilities from MongoDB’s aggregation framework , enabling users to seamlessly execute complex data transformations, computations, and analyses. This empowers businesses to not only access and modify their data but also uncover valuable insights by harnessing MongoDB’s powerful query functionality within workflows driven by MCP Toolbox. By combining the scalability and flexibility of MongoDB Atlas with MCP Toolbox’s ability to query across multiple data sources, organizations can develop advanced AI-driven applications, enhance operational efficiency, and uncover deeper analytical opportunities. The use of MongoDB as both a source and a sink within MCP Toolbox is simple and highly versatile, thanks to the flexibility of the configuration file. To configure MongoDB as a data source, you can define it under the sources section, specifying parameters such as its kind ("mongodb") and the connection’s Uniform Resource Identifier (URI) to establish access to your MongoDB instance. sources: my-mongodb: kind: mongodb uri: "mongodb+srv://username:password@host.mongodb.net" In the tools section, various operations—such as retrieving, updating, inserting, or deleting data—can be defined by linking the appropriate source, specifying the target database and dataset, and configuring parameters such as filters, projections, sorting, or payload structures. Additionally, databases can act as sinks for storing data by enabling operations to write new records or modify existing ones, making them ideal for workflows where applications or systems need to interact dynamically with persistent storage. The toolsets section facilitates grouping related tools, making it easy to load and manage specific sets of operations based on different use cases or requirements. Whether used for reading or writing data, the integration of databases via MCP Toolbox provides a streamlined and consistent approach to managing and interacting with diverse data sources. Below is an example of running "find query" on MongoDB Atlas using the MCP Toolbox. tools: get_user_profile: kind: mongodb-find-one source: my-mongo-source description: Retrieves a user's profile by their email address. database: user_data collection: profiles filterPayload: | { "email": {{json .email}} } filterParams: - name: email type: string description: The email address of the user to find. projectPayload: | { "password_hash": 0, "login_history": 0 } Getting started The integration of MongoDB Atlas and MCP Toolbox for Databases marks a significant step forward in simplifying database interactions for enterprises embracing gen AI. By enabling seamless connectivity, advanced data operations, and cross-source queries, this collaboration empowers businesses to build AI-driven applications that maximize the value of their data while enhancing efficiency and scalability. Get started today through Google Cloud Marketplace . Set up MCP Toolbox for Databases locally. Set up MongoDB Atlas source connector . And then set up MongoDB Atlas tools .

September 22, 2025

Next →

That’s a Wrap: MongoDB’s 2025 in Review & 2026 Predictions

It’s nearly the end of the year—again! That means it’s time for an end-of-year blog post that expresses disbelief at the passage of time. Which, as the saying goes, flies when you’re having fun. And definitely when you’re as busy as MongoDB was in 2025. It was a big year for the company—and more importantly, for the tens of thousands of customers and millions of developers who rely on MongoDB’s modern data platform for their most mission-critical workloads. At MongoDB, everything we do starts with our obsession with customers and their needs, and if there’s a theme to MongoDB’s 2025, it was (and will continue to be) enabling customer innovation and helping them succeed in the AI era. So here are a few highlights of how MongoDB acted on behalf of customers in 2025. From the acquisition of Voyage AI to customer success across industries, a lot happened in 2025. Let’s go!* *Read to the end for 2026 thoughts. 2025: The (MongoDB) year that was Voyage AI, modernization, and search In February, MongoDB announced the acquisition of Voyage AI, a pioneer in embedding and reranking models, to enhance the accuracy of AI applications. Integrating Voyage AI's advanced retrieval technology with MongoDB’s modern, AI-ready data platform addresses a critical challenge: LLM model hallucinations caused by a lack of context. By improving retrieval accuracy for specialized domains like finance and law, the integration enables businesses to deploy AI for mission-critical use cases. To learn more, see the MongoDB Voyage AI page. Then, in September, we launched MongoDB AMP, an AI-powered Application Modernization Platform. AMP is designed to accelerate the transformation of legacy applications through a combination of AI-powered tooling, a proven delivery framework, and expert guidance (tools, techniques, and talent) to help enterprises reduce technical debt and modernize 2-3 times faster. Want more? Sure you do! Check out this short video. MongoDB also announced the addition of search and vector search capabilities to MongoDB Community Edition and MongoDB Enterprise Server. This allows developers to build and test AI-native applications, including those using retrieval-augmented generation (RAG), in local or on-premises environments. Previously exclusive to MongoDB Atlas, these features enable secure, hybrid deployments where sensitive data can remain on-premises while still leveraging advanced search tools. Here’s a (slightly less short) video about search and vector search on Enterprise Server. Growing and scaling with MongoDB As noted, everything we do at MongoDB starts with our obsession with customers. 2025 was another banner year for customer success and innovation—we were inspired by what organizations of every shape and size, across industries and geographies, built with MongoDB in 2025. Here are just two of the many stories our customers shared in 2025; much more can be found in my colleague Katie Palmer’s blog series, Innovating with MongoDB. Factory By combining the Atlas modern data platform with Voyage AI’s high-performance embeddings, the AI-native startup Factory—which uses AI agents called Droids to accelerate software development lifecycles for organizations—consolidated its fragmented tech stack. This enabled superior code retrieval, simplified operations, and provided the scalability needed to process billions of tokens daily. McKesson McKesson, a global pharmaceutical distributor, replaced its monolithic legacy infrastructure with MongoDB Atlas to meet strict drug tracing mandates. By adopting our modern cloud data platform, McKesson scaled its operations 300x, managing tracking data for 1.2 billion containers annually without latency, and ensuring compliance and patient safety while reducing developer complexity. For more, check out the video of McKesson at MongoDB.local NYC from September. From niche NoSQL to enterprise powerhouse As senior MongoDB engineer and Technical Fellow Ashish Kumar put it earlier this year, “through a sustained and deliberate engineering effort,” MongoDB has gone from a (seemingly) niche NoSQL solution to a trusted enterprise standard, and now delivers “the high availability, tunable consistency, ACID transactions, and robust security that enterprises demand.” A new era of leadership The face of MongoDB has also changed—our CFO, Mike Berry, joined the company in April, and Dev Ittycheria stepped down as CEO in November, after more than 11 years leading the company (including its 2017 IPO). In a LinkedIn post about his role, new MongoDB CEO CJ Desai noted that the company is “at the forefront of a new data revolution, unlocking the next wave of productivity and intelligence.” “Having spent my career building and scaling technology platforms, I’ve always been drawn to companies defined by clarity of vision, relentless organic innovation, and a customer-first culture. MongoDB exemplifies all three,” said Desai. We couldn’t agree more. Onward! Reading the 2026 tea leaves So what might 2026 bring (for MongoDB and tech at large)? Here are a handful of our leaders’ predictions: “As much as people want to talk about Artificial General Intelligence (AGI), we’re still in the phase where most AI use cases automate redundant tasks but benefit from human-in-the-loop checks. Organizations that use AI to complete work that historically is a drain on human resources—but then uses people to carefully verify what AI builds, apply governance frameworks, and maintain accountability across the data lifecycle—will be more successful.” —Pete Johnson, Field CTO, AI, MongoDB “After years of inflated expectations and unsustainable spending, the AI industry is trapped in a bubble where companies reflexively attempt to deploy LLMs at every problem, driving up costs with minimal to no return. Businesses that break free from this spending cycle are the ones that understand the need to ground LLM responses in factual data and learn from prior mistakes. We believe the best way to do this will be with highly accurate embedding models and rerankers for reliable data retrieval.” —Frank Liu, Staff Product Manager, MongoDB "In 2026, cloud independence will evolve from strategic preference to existential imperative across enterprises of every scale. The outages and disruptions of recent years have exposed a fundamental truth: in an always-on digital economy—where commerce, mobility, governance, and even public safety depend on uninterrupted access to cloud services—single-provider reliance is no longer a calculated risk, but a systemic vulnerability. Compounding this is the inexorable rise of data sovereignty. Regulatory regimes worldwide now demand precise jurisdictional control over data residency, rendering rigid cloud commitments incompatible with compliance at global scale. The defining competitive advantage will belong to organizations that transcend fragile prevention theater and engineer true infrastructural resilience: architectures inherently portable, data frictionlessly mobile, and operations autonomously sustained across heterogeneous clouds through AI-orchestrated redundancy. In short, the winners will not merely mitigate downtime—they will design systems that render the concept obsolete." —Ben Cefalo, SVP, Head of Core Products, MongoDB Happy holidays and happy New Year, everyone!

December 22, 2025