mongot is a Lucene-based, JVM-hosted, memory-mapped search engine. It has different operational characteristics than mongod:
Latency-sensitive to storage. Random-read disk latency directly affects query and indexing performance. Storage class is one of the highest-impact decisions for a deployment.
Memory-mapped index access. Filesystem cache pressure correlates with query latency in a non-linear way. Small memory shortfalls cause large latency degradation.
Replication-driven.
mongotis a downstream consumer ofmongodchange streams. Replication lag, oplog reads, and connection health all affect index freshness.Multiple workload phases. Initial sync, merge, steady-state replication, and query each stress different resources. A signal that is healthy under one phase may indicate a problem under another.
Signal Categories
category | What It Tells You |
|---|---|
Health | Whether the |
复制 | Whether the |
索引 | Whether the mongot process is building and maintaining indexes, whether merges are progressing, and whether initial sync is progressing. |
查询 | Latency, throughput, and error rate across |
Executor Pools | Thread-pool utilization and queue depth. Saturation here predicts query latency degradation. |
JVM | Metrics related to Heap, Garbage Collection, and threads on the JVM. |
记录 | Metrics related to CPU, memory, disk I/O, network — at the process or container level. |
存储 | Metrics related to IOPS, page-fault rate, and free space. |
内嵌 | Throughput and error rate against the Voyage AI endpoint when Automated Embedding is enabled. |
Key Signals for mongot
These are the key signals to monitor. If you can monitor only a few signals, start with these:
Health and Process Status
Whether or not the
mongotprocess is up. Amongotthat is restarting cannot serve traffic. Crash loops signal that there is a problem withmongotavailability.Whether or not health has reached the
SERVINGstate. Amongotthat started but never finished initialization cannot respond to queries.
查询延迟
50th percentile (p50) and 99th percentile (p99) query latency for
$searchand$vectorSearch.Watch p99 especially becausemongotdeployments tend to degrade non-linearly under storage or memory pressure. p99 surfaces problems before p50 does.Latency for index-management requests such as
createSearchIndex,getSearchIndexes, and the$listSearchIndexesaggregation stage. Spikes in these requests can indicatemongotis busy or unreachable frommongod.
复制滞后
Time since the last applied change event from mongod. This signal is exposed directly as
mongot_index_stats_indexing_replicationLagMs(per-index, milliseconds). A small steady-state lag, from sub-second to seconds, is normal. However, a growing lag indicatesmongotcannot keep up. Sustained lag eventually falls off the oplog and forces a re-sync. A re-sync is much worse than lag itself because it requires a full rebuild of the index. A full index rebuild is computationally expensive and time-consuming.mongot_index_stats_*metrics are emitted per search index and only appear once at least one index exists. On a fresh deployment with no indexes, the replication-lag series is absent rather than zero. This is normal.
Indexing Progress
Active indexing operations and their state (
PENDING,BUILDING,READY,FAILED). Stuck builds indicate either resource exhaustion or a data issue.Merge throughput. Merges are background work. If merges fall behind, query latency rises.
Executor Pool Saturation
Queue depth for the query executor and indexing executor. Sustained non-zero queue depth indicates saturation and that queries are waiting for a worker thread. This is a very early warning that latency will rise.
JVM Health
Heap utilization after Garbage Collection. A heap consistently above 85% post-Garbage Collection can result in an
OutOfMemoryErrorerror.Garbage Collection pause time. Long GC pauses directly translate to query-latency spikes.
Storage Pressure
Sustained disk IOPS above the device's safe operating point is a sign that storage is the bottleneck. The storage class advisory threshold is 1,000 sustained IOPS as a flag. For details, see Storage Class Recommendations for mongot.
Page fault rate. Sustained search page faults above 1,000/s indicate the OS is repeatedly pulling index pages from disk rather than serving them from cache. With elevated IOPS, this is the canonical signal of memory pressure on the critical path.
Free Disk Space on the Index Volume
Having less than 20% free on the mongot dataPath volume can cause availability issues. Merges might require disk space beyond the live index footprint. If you undersize the volume, this can causes silent failures.
Healthy Signals
A healthy mongot deployment in steady state has the following signals:
Process is up, health is
SERVING.Replication lag is consistently sub-second.
Query p99 latency is stable across days and is not climbing.
Executor pool queue depth is consistently near zero.
JVM post-GC heap is well below the configured maximum.
Indexing operations finish in expected time and complete in
READY.No errors in the mongot log beyond known benign messages.
If all of these are true, the deployment is considered healthy. If there is any drift, see the relevant page for deeper metrics to investigate.
Atlas vs. Self-Managed Observability
Atlas users have:
The Atlas Search Metrics UI.
MongoDB-managed alerting on a curated set of thresholds.
Atlas support with direct access to
mongotFTDC and logs.
Self-managed users have:
The metrics and logs you configure
mongotto export.The alerting platform that you wire up.
Responsibility for FTDC capture and forwarding to MongoDB support if you need help.
Self-managed users must stand up monitoring before going to production.