The mongot process exposes Prometheus metrics that describe its runtime health and performance across core areas of operation. This reference page describes key metrics that are relevant to day-to-day monitoring and troubleshooting. For the complete metric set, scrape the mongot Prometheus metrics endpoint at http://<mongot-host>:9946/metrics.
查看指标
To view the raw metrics that mongot exposes, send an HTTP GET request to the following mongot Prometheus metrics endpoint:
http://<mongot-host>:9946/metrics
In this endpoint:
<mongot-host>is the hostname or IP address of themongotprocess.9946is the default port for the metrics endpoint. To configure the metrics endpoint port, see themetrics.addresssetting in themongotconfiguration file./metricsis the path for the metrics endpoint.
The /metrics endpoint returns metrics in plain Prometheus text format. To monitor mongot metrics over time, configure your Prometheus instance to scrape this endpoint.
重要
The /metrics endpoint requires no authentication by default. For production deployments, restrict access at the network layer.
Metric Naming Conventions
mongot metric names use a consistent naming pattern:
All metric names start with the
mongot_prefix.Metric names generally follow the pattern
mongot_<area>_<measurement>[_<unit>], where:<area>indicates the subsystem or component the metric belongs to, such asprocess,jvm,replication, orindex.<measurement>indicates what is being measured, such ascpu_usage,heap_memory, orindex_size.<unit>(optional) indicates the unit or counter semantics for the metric. This optional suffix indicates either the unit that the metric is measured in, such asseconds,bytes, orms, or the type of counter the metric represents, such astotal,events, oroperations.注意
Some metric name suffixes don't reflect the actual reported unit for the metric. For example,
mongot_index_stats_query_latency_secondshas the suffix_seconds, butmongotreports the metric in milliseconds, as indicated by thetimeUnit=millisecondslabel in the metric output. To confirm the unit for each metric, check the Unit value in the metric reference tables below.
In addition to the metric name, mongot metrics can include labels (also called dimensions). Labels distinguish multiple time series that share the same base metric name. For example, a metric might use labels to identify a state, status, index type, quantile, or a specific index.
For some metrics, you must interpret the metric as the combination of the metric name and its labels, not by the metric name alone. For example, mongot_replication_mongodb_indexManagerState uses the state label to expose one series for each replication state, such as STEADY_STATE or FAILED. Exactly one of those labeled series has the value 1 at a time. Per-index metrics similarly use labels such as generationId_logString and indexId_logString to distinguish one index from another.
For distribution metrics, the suffix of the metric name indicates the Prometheus series type:
Histograms expose
_bucket,_count,_sum, and_max.Summaries expose
_count,_sum, and_max. Some summaries also include quantile labels such as{quantile="0.5"}.
Common Metric Labels
The following table describes common labels that appear in mongot metrics.
Label Name | Metric Scope | Possible Values |
|---|---|---|
| All executor pools |
|
| Cross-cutting |
|
| Most | Internal opaque Ids (the per-index identifier that the logs use) |
| Many index metrics |
|
| Indexing and initial-sync metrics |
|
| Index size and document metrics |
|
| Latency summary metrics |
|
| Summary metrics |
|
|
|
|
|
|
|
|
|
|
Key Metric Groups
Process and JVM Metrics
Use process and JVM metrics to confirm that mongot is running normally and to identify heap or garbage collection pressure.
处理
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 秒数 | The uptime of the Java Virtual Machine. |
| 仪表盘 | unix seconds | Start time of the process since unix epoch. |
| 计数器 | 纳秒 | The "cpu time" used by the Java Virtual Machine process. Use |
| 仪表盘 | 0-1 | The "recent cpu usage" for the Java Virtual Machine process. |
JVM Memory
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 字节 | The amount of used memory. Labels: |
| 仪表盘 | 字节 | The amount of memory committed for the Java virtual machine to use. |
| 仪表盘 | 字节 | The maximum memory that can be used. For heap, |
| 仪表盘 | 数数 | NIO buffer pool counts. Labels: |
| 仪表盘 | 字节 | Memory the JVM uses for NIO buffer pools. |
| 仪表盘 | 字节 | NIO buffer pool capacity. |
JVM Garbage Collection
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 总结 | 秒数 | Time spent in GC pause. No quantile labels. Use |
| 总结 | 秒数 | Time spent in concurrent GC phase. |
| 仪表盘 | 字节 | Size of long-lived heap memory pool after reclamation. The "live heap" to watch for memory pressure. |
| 仪表盘 | 字节 | Max size of long-lived heap memory pool. |
| 计数器 | 字节 | Increase in young heap pool size between GCs. |
| 计数器 | 字节 | Promotions from young into old generation. |
System Metrics
Use system metrics to monitor host-level CPU, disk, memory, paging, and network conditions that can affect mongot.
CPU
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 数数 | Processors available to the JVM. |
| 仪表盘 | 0–1 | Recent system CPU usage. |
| 仪表盘 | 无单位 | OS 1-minute load average. |
Disk
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 字节 | Free disk space on the |
| 仪表盘 | 字节 | Total disk space on the |
| 仪表盘 | 字节 | Free and total disk space across the file system (different scope than |
| 仪表盘 | 字节 | Bytes read from disk per device. Label: |
| 仪表盘 | 字节 | Bytes written per device. |
| 仪表盘 | 数数 | Read I/O count per device. Use |
| 仪表盘 | 数数 | Write I/O count per device. |
| 仪表盘 | 数数 | Disk queue length (I/Os in progress) per device. |
| 仪表盘 | 毫秒 | Time spent reading or writing per device. |
内存
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 字节 | Total physical memory on the host. |
| 仪表盘 | 字节 | Physical memory available. |
| 仪表盘 | 字节 | Physical memory in use. |
| 仪表盘 | 字节 | Total physical and virtual memory in use. |
| 仪表盘 | 字节 | Swap state. |
| 仪表盘 | 数数 | Swap in/out activity. |
| 仪表盘 | 数数 | Number of memory mappings (relevant for Lucene mmap counts). |
| 仪表盘 | 字节 | System page size. |
页面错误
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 数数 | Major page faults. Use this metric with the storage class advisory threshold. |
| 仪表盘 | 数数 | Minor page faults. |
网络
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 字节 | Bytes received and sent per interface ( |
| 仪表盘 | 数数 | Packets received and sent. |
| 仪表盘 | 数数 | Error, drop, and collision counters. |
| 仪表盘 | bits/sec | Negotiated interface speed. |
复制指标
Use replication metrics to determine whether mongot is healthy, syncing normally, and staying caught up with mongod.
Overall State
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 0/1 |
|
| 仪表盘 | 0/1 |
|
| 计数器 | 数数 | State transitions. Labels: |
Session Refresher
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 数数 | Active sessions. |
| 计数器 | 数数 | Total session refreshes. |
| 计数器 | 数数 | Failed refreshes. |
| 总结 | 秒数 | Refresh duration distribution. |
Optime Updater
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 计数器 | 数数 | Optime-update errors. |
| 多个 | 不适用 | Executor metrics for the optime updater. |
Per-index Metrics
mongot emits the following metrics per index and includes generationId_logString and indexId_logString labels to identify the specific index. Filter by those labels to inspect a specific index, or aggregate across labels to understand fleet-wide behavior.
Index Status and Size
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 0/1 | Per-index status. One-hot encoded across the |
| 仪表盘 | 数字 | On-disk index format version. For example, |
| 仪表盘 | 字节 | Total on-disk size of the index. |
| 仪表盘 | 字节 | Largest single file in the index. |
| 仪表盘 | 数数 | Number of Lucene segment files. |
| 仪表盘 | 数数 | Lucene documents in the index. |
| 仪表盘 | 数数 | Maximum Lucene document ID (includes deleted-not-merged). |
| 仪表盘 | 数数 | Number of indexed Lucene fields. |
| 仪表盘 | 数数 | Number of Lucene segments. |
| 仪表盘 | 字节 | Estimated required memory for the index. |
Indexing Metrics
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 毫秒 | Replication lag per index, in milliseconds. The unit is in the metric name ( |
| 仪表盘 | BSON Timestamp | Last applied replication optime (numeric encoding). |
| 仪表盘 | BSON Timestamp | Cap on advance, set by |
| 计数器 | 数数 | Indexing operation counts. Label: |
| 计数器 | 字节 | Total bytes processed by indexing. |
| 计数器 | 数数 | Vector fields indexed. |
| 总结 | seconds ( | Per-index commit durations. |
| 总结 | 秒数 | Batch duration distribution. |
| 计数器 | 数数 | Oversized change-stream events. Label: |
| 计数器 | 数数 | Documents rejected for invalid geometry. |
| 计数器 | 数数 | Truncated sortable strings. |
| 计数器 | 数数 | Exceptions during initial sync. |
| 计数器 | 数数 | Steady-state exceptions. |
| 仪表盘 | 数数 | Consecutive initial-sync resync exceptions for this index. |
查询指标
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 计数器 | 数数 | Total queries issued against the index. |
| 计数器 | 数数 | Total hits returned. |
| 计数器 | 数数 | Queries that failed. |
| 计数器 | 数数 | Specific failure-class counters. |
| 总结 | seconds ( | Search batch latency. This is the headline query-latency metric. |
| 总结 | seconds ( | Latency inside Lucene's TopDocs search. Use this metric to distinguish Lucene-internal latency from total latency. |
| 总结 | seconds ( | Vector search result latency. |
| 总结 | seconds ( | Vector search latency phases. |
| 总结 | seconds ( | Facets state-refresh latency. |
| 总结 | bytes / count | Per-batch payload size and document count. |
| histogram | count ( | Distribution of |
| histogram | 数数 | Vector candidates per query, bucketed by quantization. |
| counter / summary | 数数 |
|
| 计数器 | 数数 | Result batches with score ties. |
| 计数器 | 数数 | Queries that benefited from index sort optimization. |
| 计数器 | 数数 | Limit-extraction optimizations triggered. |
| 计数器 | 数数 | Phantom-searcher cleanups. |
| 总结 | ratio | Deleted-document ratio in returned results. |
| 计数器 | 数数 | Batches that made no forward progress. |
| 计数器 | 数数 | Vector-specific counters. |
| 计数器 | 数数 | Per-query-feature usage. Label: |
| 计数器 | 数数 | Failed |
Per-index Replication Breakdown
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 0/1 | The canonical replication-state signal. |
| 总结 | bytes / count | Steady-state batch sizes. |
| 总结 | 秒数 | Steady-state decoding duration. |
| 总结 | 秒数 | Steady-state |
| 总结 | — | Initial-sync change-stream phase metrics (mirrors steady state). |
| 总结 | — | Initial-sync collection-scan phase metrics. |
Lucene Refresh Latency
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 总结 | seconds ( | Lucene IndexReader refresh latency. |
Command Metrics
mongot accepts the following set of named commands from mongod:
buildinfogetMorehelloisMasterismasterkillCursorsmanageSearchIndexpingplanShardedSearchsearchvectorSearch
For each command, mongot exposes the following metrics, where <name> is a placeholder for the command name:
模式 | 类型 | 说明 |
|---|---|---|
| 计数器 | Failure count for the command. |
| 总结 | End-to-end latency including serialization. |
| 总结 | Serialization latency (subset; not all commands). |
提示
Monitor Search and Vector Search Latency Across Indexes
mongot_command_searchCommandTotalLatency_seconds and mongot_command_vectorSearchCommandTotalLatency_seconds are the primary metrics to monitor for $search and $vectorSearch latency aggregates. These expose latency aggregates for all search and vectorSearch commands across all indexes.
Indexing Scheduler and Dispatcher Metrics
Use indexing scheduler and dispatcher metrics to identify backlog, saturation, and slow work in replication and indexing pipelines.
Steady-state Change-stream Pipeline
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 数数 | Batches currently being applied. Label: |
| 总结 | 秒数 | Duration distribution for in-flight batches. |
| 总结 | 秒数 |
|
| 仪表盘 | 数数 |
|
| 仪表盘 | 数数 | Scheduled |
| 总结 | 秒数 |
|
| 总结 | 秒数 | Pre-processing duration per batch. |
| 计数器 | 数数 | Total change-stream events observed. Label: |
| 计数器 | 数数 | Events that |
| 仪表盘 | 0/1 | Dispatcher status. Labels: |
| 计数器 | 数数 | Events skipped due to missing metadata. |
| 计数器 | 数数 | Unexpected batch failures. |
| 计数器 | 数数 | Rescheduled embedding getMores. This metric is only available when you configure Automated Embedding. |
| 计数器 | 数数 | Failed change-stream mode sampling attempts. |
Indexing Work Scheduler
Indexing work scheduler metrics monitor the queueing and execution of indexing batches.
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 数数 | Scheduler queue depth. |
| 计数器 | 数数 | Enqueue and dequeue counts. |
| 总结 | 数数 | Distribution of batch sizes. Label: |
| 总结 | 秒数 | Batch durations. |
| 总结 | 秒数 | Scheduling overhead. |
Decoding Work Scheduler
Decoding work scheduler metrics monitor the queueing and execution of change-stream batch decoding.
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 数数 | Scheduler queue depth. |
| 计数器 | 数数 | Enqueue and dequeue counts. |
| 总结 | 数数 | Distribution of batch sizes. Label: |
| 总结 | 秒数 | Batch durations. |
| 总结 | 秒数 | Scheduling overhead. |
Initial Sync, Lifecycle, and Config Metrics
Use these metrics to track index startup work, recovery, and catalog state.
初始化同步(Resumable Initial Sync)
注意
Some mongot metrics are phase-specific and populate only when the corresponding code path is active. For example, steady-state replication metrics, such as mongot_index_stats_indexing_replicationLagMs and the mongot_index_stats_replication_steadyState_* series, do not populate while an index is in initial sync. Conversely, initial-sync-specific metrics, such as mongot_initialsync_* and mongot_index_stats_replication_initialSync_*, are only relevant while initial sync is running or has run.
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 数数 | Queued initial syncs. Label: |
| 计数器 | 数数 | Embedding initial syncs that were requeued. This metric is only available when you configure Automated Embedding. |
| 仪表盘 | 数数 | Initial syncs currently in progress. Label: |
| 仪表盘 | 数数 | Initial syncs queued at the dispatcher. |
| 仪表盘 | 数数 | In-progress syncs that resumed from a checkpoint. |
| 仪表盘 | 0/1 | Active collection-scan mode. Label: |
| 总结 | 秒数 | Completed sync duration distribution. |
| 总结 | 秒数 | Ongoing sync duration. |
| 仪表盘 | 秒数 | Min, max, and sum of in-progress initial sync durations. |
| 计数器 | 数数 | Dropped because their on-disk segments could not be read. |
| 计数器 | 数数 | Recovered after unreadable segments. Label: |
生命周期
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 数数 | Indexes currently in the initialized state. |
| 总结 | 秒数 | Initialization durations. |
| 计数器 | 数数 | Index downloads that failed. |
| 计数器 | 数数 | Index drops that failed. |
| 计数器 | 数数 | Index initializations that failed. |
Config State
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 数数 | Indexes currently in the catalog. Labels: |
| 仪表盘 | 数数 | Indexes being phased out. |
| 仪表盘 | 数数 | Staged but not yet active indexes. |
| 仪表盘 | 数数 | Feature-version-4-specific equivalents. |
Cursors and Index Factory
Use these metrics to monitor open cursor state and to detect indexes that mongot dropped or recovered because their on-disk segments were unreadable.
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 数数 | Currently tracked open cursors. |
| 计数器 | 数数 | Indexes dropped because their segments were unreadable. |
| 计数器 | 数数 | Recoveries after unreadable segments. Label: |
Lucene Merge
Use these metrics to monitor Lucene segment merge activity, including the number and size of merges in progress, merge input and output sizes, merge durations, and merges discarded by the disk-utilization merge policy.
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 数数 | Active merges. Label: |
| 仪表盘 | 数数 | Documents currently being merged. |
| 计数器 | 数数 | Total merges executed since startup. |
| 计数器 | 数数 | Segments folded by merges. |
| 总结 | 字节 | Distribution of merge input sizes. |
| 总结 | 字节 | Distribution of merge output sizes. |
| 总结 | 数数 | Documents-per-merge distribution. |
| 总结 | 秒数 | Merge duration distribution. |
| 计数器 | 数数 | Merges discarded by the disk-utilization-aware policy. |
MongoDB Client Connection Pool Metrics
mongot opens multiple named connection pools to mongod, and labels each pool with a clientName label that identifies the role of each pool. The following table lists possible clientName label values and their corresponding role:
clientName | 用途 |
|---|---|
| Steady-state change-stream replication. |
| Initial sync and session refresh. The |
| Internal metadata service. |
| Optime polling. |
| Database metadata lookups. |
| Server-info lookups. |
| Lease manager. |
| Automated embedding writes. This connection pool only appears when you configure Automated Embedding. |
The following table lists the available metrics for mongot connection pools:
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 仪表盘 | 数数 | Currently open connections in the pool. |
| 仪表盘 | 数数 | Connections currently checked out. |
| 仪表盘 | 数数 | Configured max pool size. |
| 仪表盘 | 数数 | Configured min pool size. |
| 计数器 | 数数 | Successful native OpenSSL link attempts. |
| 计数器 | 数数 | Failed native OpenSSL link attempts. |
同义词
Use these metrics to monitor synonym synchronization activity, including collection scans, scan and sync durations, queue depth, and exceptions encountered during synonym sync.
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 计数器 | 数数 | Total collection scans performed for synonyms. |
| 计数器 | 数数 | Synonym scans triggered by change-stream events. |
| 总结 | 秒数 | Scan duration distribution. |
| 总结 | 秒数 | Sync duration distribution. |
| 仪表盘 | 数数 | Current synonym sync queue depth. |
| 计数器 | 数数 | Synonym sync exceptions. |
Executor Pools
Use these metrics to monitor the named executor pools that mongot uses to run background work. Each pool exposes the same set of sub-metrics, prefixed with the pool name, so you can track thread activity, pool sizing, queue depth, task throughput, and per-task execution time across all pools.
The following table lists the sub-metrics that every executor pool exposes, where <pool> is the pool name prefix. All executor-pool sub-metrics carry the label name="executorMetrics".
Sub-Metric Suffix | 类型 | 说明 |
|---|---|---|
| 仪表盘 | Threads currently executing tasks. |
| 仪表盘 | Pool sizing. |
| 仪表盘 | Tasks waiting for a thread — the saturation signal. |
| 仪表盘 | Remaining queue capacity. |
| 计数器 | Tasks completed since startup. |
| 总结 | Time threads spent idle between tasks. |
| 总结 | Per-task execution time. |
| 计数器 | Scheduled task counts (for scheduling pools). |
The following table lists the prefixes for all available named executor pools and their respective purposes:
Executor Pool Prefix | What it runs |
|---|---|
| Blob-store lifecycle work. |
| Blocking gRPC server worker threads. |
| Change-stream mode selection. |
| Change-stream sync dispatching (one of the busiest in steady state). |
| Config-monitor polling. |
| Decoding pipeline workers. |
| Disk-monitor polling. |
| gRPC health check timer. |
| Idle cursor reaping. |
| Index commit operations. |
| Per-index lifecycle work. |
| Lucene IndexReader refreshes. |
| Indexing pipeline workers (the busiest indexing pool in steady state). |
| Indexing-lifecycle work. |
| Automated-embedding indexing path. This executor pool only appears when you configure Automated Embedding. |
| Init-time lifecycle work. |
| Materialized-view tracking and lifecycle. These metrics are only available when Automated Embedding or other materialized-view-backed features are configured. |
| Optime updater (background). |
| Session refresher. |
| System metrics updater. |
提示
Watch Saturation Across All Executor Pools
To monitor saturation across all executor pools, run the following PromQL query:
max by (pool) ( label_replace( {__name__=~"mongot_.+_executor_queued_tasks"}, "pool", "$1", "__name__", "mongot_(.+)_executor_queued_tasks" ) )
This query returns the queued-task count for each executor pool.
Prometheus Server Self-Metrics
The following metric is available for the embedded Prometheus server in mongot:
衡量标准 | 类型 | 单位 | 说明 |
|---|---|---|---|
| 总结 | seconds ( | How long |
Configuration-Specific Metrics
The following metric families appear in the /metrics output only when you enable specific features.
Metric Family | 说明 | Availability in Self-Managed mongot |
|---|---|---|
| Metrics related to Automated Embedding. For example, | Appear only when you configure Automated Embedding. |
| Failure count for the FTDC executor. | Appears only when you enable the |
PromQL Examples
Most latency metrics in this catalog are summaries, not histograms, so use their published quantile labels directly when they exist. A smaller number of metrics, such as mongot_index_stats_query_limitPerQuery and mongot_index_stats_query_numCandidatesPerQuery, are histograms and expose _bucket series.
# Replication state max by (state) (mongot_replication_mongodb_indexManagerState == 1) # Maximum replication lag across all indexes, converted to seconds max(mongot_index_stats_indexing_replicationLagMs) / 1000 # Index count by status count by (status) (mongot_index_stats_indexStatusCode == 1) # Search query p99 latency across all indexes max(mongot_index_stats_query_searchResultBatchLatencies_seconds{quantile="0.99"}) # Worst recent GC pause max(mongot_jvm_gc_pause_seconds_max) # Average GC pause over 5 minutes rate(mongot_jvm_gc_pause_seconds_sum[5m]) / rate(mongot_jvm_gc_pause_seconds_count[5m]) # Free disk percentage on dataPath mongot_system_disk_space_data_path_free_bytes / mongot_system_disk_space_data_path_total_bytes # Major page fault rate rate(mongot_system_process_majorPageFaults_operations[5m]) # Steady-state and initial sync exceptions over 15 minutes sum(rate(mongot_index_stats_indexing_steadyStateExceptions_total[15m])) sum(rate(mongot_index_stats_indexing_initialSyncExceptions_total[15m]))