Peek at your MongoDB Clusters like a Pro with Keyhole: Part 3

Ken Chen
June 11, 2020

In part 3 of this blog post, you will learn about Keyhole's new Full Time Diagnostic Data Capture (FTDC) Assessment panel and how FTDC scoring works. In the previous two parts of this blog, part 1 and 2, we discussed how to use Keyhole and Maobi to analyze MongoDB clusters. One of the most exciting features is the visualized presentation of FTDC data. Understanding these charts is similar to following the shadow of the moon, which has 30 different shapes and even more different shades of gray. Interpretations of these FTDC charts could also be subjective and relatively time-consuming, and thus I added a scoring feature to help identify potential problems quickly.

I split the FTDC analytics feature from Keyhole in another GitHub repository, mongo-ftdc, to better support people who opt to skip the Grafana installation. As you might expect, Keyhole imports the mongo-ftdc package.

Five Boldest Strokes

There are many software options available to display MongoDB metrics. Most focus on reflecting server status in charts, but none of them provide the additional information required to flag problems. There is no simple answer because there are many factors to consider, including resource availability and quality, transaction rates, and configuration specifics.

I recall a movie line, “The difference between a good painting and a great painting is the five boldest strokes.” This inspired me to add a few bold strokes to Keyhole by applying scoring algorithms to metrics in an assessment panel.

How Scoring Works

In order to remove any anomalies in the data, only data points between the 5th and 95th percentile are evaluated. A metric can score from 0 to 100 and the higher the score, the better. The color codes are defined in the Grafana configurations. By default, scoring under 20 is highlighted in red and over 50 is in green. A score between 20 and 50 is flagged in orange color. Below is an example:

The formula to determine the score of a metric varies depending on the nature of that metric. In the initial release of this new feature, formulas are defined from my experience. I am open to feedback and will fine-tune the metrics as we go. In the meantime, let’s discuss a few details of the formulas available by clicking on the metrics’ links.

Low and High Watermarks

All formulas have low and high usage watermarks. If the usage is below the low watermark, a score of 100 is given. On the other hand, if the usage is above the high watermark, a score of 0 is assigned. For example, the score of ticket_avail_read is scored with a formula of:

100 * (p5 of ticket_avail_read) / 128

, where p5 indicates the fifth percentile of the data points.

Another example is cpu_iowait; in that case the score is bounded as follows:

value := (p95 of cpu_iowait)
if value < low_wm {
  score = 100
} else if value > high_wm {
  score = 0
} else {
  score = 100 * (1 - (value - low_wm) / (high_wm - low_wm))
}

, where p95 indicates 95th percentile of the data points.

Metrics with Known Behavior

For metrics with known behavior, I use the given thresholds as low and high watermarks to calculate scores. For example, when WiredTiger cache used (wt_cache_used) is over 80%, it triggers evictions using background threads. If the cache used is over 95%, application threads are used for active evictions. Therefore, to calculate the score, Keyhole uses 80% as the low watermark and 95% for the high watermark.

Metrics Calculated with Derived Values

The scores of a few metrics depend on the derived values, for example, the total number of connections (conn_current). It is less than ideal to provide a score simply based on the number of connections. We instead evaluate it by calculating how much memory is used by all connections. Each connection will account for roughly 1MB of memory. Keyhole first calculates the percentage of memory allocated to connections and uses 5% as the low watermark and 20% for the high watermark to calculate its score.

Bottleneck Patterns

MongoDB is like a coworker you can only dream of. One who is smart, friendly, stylish but not flashy, and speaks multiple languages. Does one have a happy life? That’s on Instagram. Under stress, one shows a few syntactic patterns. Below are a few patterns for discussions and you can find charts and examples in the Keyhole Wiki page.

Lost in Space

You didn’t have expected performance results from properly provisioned resources even if the provisioned memory was enough to contain the working set data. In this pattern, all metrics were in healthy states except a large number of scanned objects. This could be a textbook case of using improper or even missing indexes. Even when the entire working set fits in memory, without proper indexes there could be an excessive number of object scans. Imagine a spaceship roaming in space without a navigation system; there will be more space to cover, thus resulting in a much slower response from a query.

Dream Weaver

Another commonly experienced underperforming read operation was caused by less than ideal data access use cases. Most read related metrics were flagged, such as low WiredTiger available read tickets, high WiredTiger cache used, and a large number of scanned objects.

Architects and developers are attracted to MongoDB because of its modern and flexible technologies. With simple transformations, one can quickly turn XML into JSON data or to directly map relational database tables to MongoDB collections. It's like a dream come true, and Dream Weavers love to use the $lookup operator and allow many collections to get carelessly intertwined. Such implementations do not work efficiently. An important principle of using MongoDB is to have a proper schema to support your use cases. MongoDB is not "schema-less"; rather, it allows for multiple concurrent schemas to be present. These schemas, however, need to be designed appropriately.

Vikings Attack

Everything worked mostly during the day except for a short period of time. During the time, the cluster experienced high CPU utilization on I/O wait, WiredTiger dirty data ratio, and disk IOPS. Resources provisioning should be based on loads during peak hours. Many businesses must process a large number of transactions within a limited time. As such, the database operations burst in like a sudden Viking attack -- arriving in fast dragon ships and flooding in on the crest of morning tides. Vikings were huge and enormous, and they charged fearlessly with long swords, axes, and round wooden shields. In this case, these short 'attacks' raided the village, burning up disk IOPS, and brought oplog collection to the ground before retrieval was complete.

New York, New York

Many people rush to New York to melt away little town blues, so the City can use a little bit of breathing room at times. Similarly, hosting an excessive number of collections and indexes in a MongoDB cluster creates additional overhead in WiredTiger. We often see this in multi-tenant implementations. The large number of tables maintained to support collections and indexes in the WiredTiger results in the high number of WiredTiger data handles, which cause extremely long checkpoints and block all the running operations.

Angel Has Fallen

Almost all metrics reveal that resources were all under stress. The number of queued operations was high and the cluster was restlessly catching up. All resources might be simply pegged out from many charts showing high flying plateau lines. If there is no room for tuning, consider scaling up resources or sharding.

Related Links

← Previous

Apervita: With MongoDB, vital healthcare data stays accessible AND secure

See how Apervita uses the document model and MongoDB Atlas to serve their customer needs.

June 10, 2020

Next →

That’s a Wrap: MongoDB’s 2025 in Review & 2026 Predictions

It’s nearly the end of the year—again! That means it’s time for an end-of-year blog post that expresses disbelief at the passage of time. Which, as the saying goes, flies when you’re having fun. And definitely when you’re as busy as MongoDB was in 2025. It was a big year for the company—and more importantly, for the tens of thousands of customers and millions of developers who rely on MongoDB’s modern data platform for their most mission-critical workloads. At MongoDB, everything we do starts with our obsession with customers and their needs, and if there’s a theme to MongoDB’s 2025, it was (and will continue to be) enabling customer innovation and helping them succeed in the AI era. So here are a few highlights of how MongoDB acted on behalf of customers in 2025. From the acquisition of Voyage AI to customer success across industries, a lot happened in 2025. Let’s go!* *Read to the end for 2026 thoughts. 2025: The (MongoDB) year that was Voyage AI, modernization, and search In February, MongoDB announced the acquisition of Voyage AI, a pioneer in embedding and reranking models, to enhance the accuracy of AI applications. Integrating Voyage AI's advanced retrieval technology with MongoDB’s modern, AI-ready data platform addresses a critical challenge: LLM model hallucinations caused by a lack of context. By improving retrieval accuracy for specialized domains like finance and law, the integration enables businesses to deploy AI for mission-critical use cases. To learn more, see the MongoDB Voyage AI page. Then, in September, we launched MongoDB AMP, an AI-powered Application Modernization Platform. AMP is designed to accelerate the transformation of legacy applications through a combination of AI-powered tooling, a proven delivery framework, and expert guidance (tools, techniques, and talent) to help enterprises reduce technical debt and modernize 2-3 times faster. Want more? Sure you do! Check out this short video. MongoDB also announced the addition of search and vector search capabilities to MongoDB Community Edition and MongoDB Enterprise Server. This allows developers to build and test AI-native applications, including those using retrieval-augmented generation (RAG), in local or on-premises environments. Previously exclusive to MongoDB Atlas, these features enable secure, hybrid deployments where sensitive data can remain on-premises while still leveraging advanced search tools. Here’s a (slightly less short) video about search and vector search on Enterprise Server. Growing and scaling with MongoDB As noted, everything we do at MongoDB starts with our obsession with customers. 2025 was another banner year for customer success and innovation—we were inspired by what organizations of every shape and size, across industries and geographies, built with MongoDB in 2025. Here are just two of the many stories our customers shared in 2025; much more can be found in my colleague Katie Palmer’s blog series, Innovating with MongoDB. Factory By combining the Atlas modern data platform with Voyage AI’s high-performance embeddings, the AI-native startup Factory—which uses AI agents called Droids to accelerate software development lifecycles for organizations—consolidated its fragmented tech stack. This enabled superior code retrieval, simplified operations, and provided the scalability needed to process billions of tokens daily. McKesson McKesson, a global pharmaceutical distributor, replaced its monolithic legacy infrastructure with MongoDB Atlas to meet strict drug tracing mandates. By adopting our modern cloud data platform, McKesson scaled its operations 300x, managing tracking data for 1.2 billion containers annually without latency, and ensuring compliance and patient safety while reducing developer complexity. For more, check out the video of McKesson at MongoDB.local NYC from September. From niche NoSQL to enterprise powerhouse As senior MongoDB engineer and Technical Fellow Ashish Kumar put it earlier this year, “through a sustained and deliberate engineering effort,” MongoDB has gone from a (seemingly) niche NoSQL solution to a trusted enterprise standard, and now delivers “the high availability, tunable consistency, ACID transactions, and robust security that enterprises demand.” A new era of leadership The face of MongoDB has also changed—our CFO, Mike Berry, joined the company in April, and Dev Ittycheria stepped down as CEO in November, after more than 11 years leading the company (including its 2017 IPO). In a LinkedIn post about his role, new MongoDB CEO CJ Desai noted that the company is “at the forefront of a new data revolution, unlocking the next wave of productivity and intelligence.” “Having spent my career building and scaling technology platforms, I’ve always been drawn to companies defined by clarity of vision, relentless organic innovation, and a customer-first culture. MongoDB exemplifies all three,” said Desai. We couldn’t agree more. Onward! Reading the 2026 tea leaves So what might 2026 bring (for MongoDB and tech at large)? Here are a handful of our leaders’ predictions: “As much as people want to talk about Artificial General Intelligence (AGI), we’re still in the phase where most AI use cases automate redundant tasks but benefit from human-in-the-loop checks. Organizations that use AI to complete work that historically is a drain on human resources—but then uses people to carefully verify what AI builds, apply governance frameworks, and maintain accountability across the data lifecycle—will be more successful.” —Pete Johnson, Field CTO, AI, MongoDB “After years of inflated expectations and unsustainable spending, the AI industry is trapped in a bubble where companies reflexively attempt to deploy LLMs at every problem, driving up costs with minimal to no return. Businesses that break free from this spending cycle are the ones that understand the need to ground LLM responses in factual data and learn from prior mistakes. We believe the best way to do this will be with highly accurate embedding models and rerankers for reliable data retrieval.” —Frank Liu, Staff Product Manager, MongoDB "In 2026, cloud independence will evolve from strategic preference to existential imperative across enterprises of every scale. The outages and disruptions of recent years have exposed a fundamental truth: in an always-on digital economy—where commerce, mobility, governance, and even public safety depend on uninterrupted access to cloud services—single-provider reliance is no longer a calculated risk, but a systemic vulnerability. Compounding this is the inexorable rise of data sovereignty. Regulatory regimes worldwide now demand precise jurisdictional control over data residency, rendering rigid cloud commitments incompatible with compliance at global scale. The defining competitive advantage will belong to organizations that transcend fragile prevention theater and engineer true infrastructural resilience: architectures inherently portable, data frictionlessly mobile, and operations autonomously sustained across heterogeneous clouds through AI-orchestrated redundancy. In short, the winners will not merely mitigate downtime—they will design systems that render the concept obsolete." —Ben Cefalo, SVP, Head of Core Products, MongoDB Happy holidays and happy New Year, everyone!

December 22, 2025