The mongot process exposes Prometheus metrics that describe its runtime health and performance across core areas of operation. This reference page describes key metrics that are relevant to day-to-day monitoring and troubleshooting. For the complete metric set, scrape the mongot Prometheus metrics endpoint at http://<mongot-host>:9946/metrics.
메트릭 조회
To view the raw metrics that mongot exposes, send an HTTP GET request to the following mongot Prometheus metrics endpoint:
http://<mongot-host>:9946/metrics
In this endpoint:
<mongot-host>is the hostname or IP address of themongotprocess.9946is the default port for the metrics endpoint. To configure the metrics endpoint port, see themetrics.addresssetting in themongotconfiguration file./metricsis the path for the metrics endpoint.
The /metrics endpoint returns metrics in plain Prometheus text format. To monitor mongot metrics over time, configure your Prometheus instance to scrape this endpoint.
중요
The /metrics endpoint requires no authentication by default. For production deployments, restrict access at the network layer.
Metric Naming Conventions
mongot metric names use a consistent naming pattern:
All metric names start with the
mongot_prefix.Metric names generally follow the pattern
mongot_<area>_<measurement>[_<unit>], where:<area>indicates the subsystem or component the metric belongs to, such asprocess,jvm,replication, orindex.<measurement>indicates what is being measured, such ascpu_usage,heap_memory, orindex_size.<unit>(optional) indicates the unit or counter semantics for the metric. This optional suffix indicates either the unit that the metric is measured in, such asseconds,bytes, orms, or the type of counter the metric represents, such astotal,events, oroperations.참고
Some metric name suffixes don't reflect the actual reported unit for the metric. For example,
mongot_index_stats_query_latency_secondshas the suffix_seconds, butmongotreports the metric in milliseconds, as indicated by thetimeUnit=millisecondslabel in the metric output. To confirm the unit for each metric, check the Unit value in the metric reference tables below.
In addition to the metric name, mongot metrics can include labels (also called dimensions). Labels distinguish multiple time series that share the same base metric name. For example, a metric might use labels to identify a state, status, index type, quantile, or a specific index.
For some metrics, you must interpret the metric as the combination of the metric name and its labels, not by the metric name alone. For example, mongot_replication_mongodb_indexManagerState uses the state label to expose one series for each replication state, such as STEADY_STATE or FAILED. Exactly one of those labeled series has the value 1 at a time. Per-index metrics similarly use labels such as generationId_logString and indexId_logString to distinguish one index from another.
For distribution metrics, the suffix of the metric name indicates the Prometheus series type:
Histograms expose
_bucket,_count,_sum, and_max.Summaries expose
_count,_sum, and_max. Some summaries also include quantile labels such as{quantile="0.5"}.
Common Metric Labels
The following table describes common labels that appear in mongot metrics.
Label Name | Metric Scope | Possible Values |
|---|---|---|
| All executor pools |
|
| Cross-cutting |
|
| Most | Internal opaque Ids (the per-index identifier that the logs use) |
| Many index metrics |
|
| Indexing and initial-sync metrics |
|
| Index size and document metrics |
|
| Latency summary metrics |
|
| Summary metrics |
|
|
|
|
|
|
|
|
|
|
Key Metric Groups
Process and JVM Metrics
Use process and JVM metrics to confirm that mongot is running normally and to identify heap or garbage collection pressure.
프로세스
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 초 | The uptime of the Java Virtual Machine. |
| 게이지 | unix seconds | Start time of the process since unix epoch. |
| 카운터 | 나노초 | The "cpu time" used by the Java Virtual Machine process. Use |
| 게이지 | 0-1 | The "recent cpu usage" for the Java Virtual Machine process. |
JVM Memory
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 바이트 | The amount of used memory. Labels: |
| 게이지 | 바이트 | The amount of memory committed for the Java virtual machine to use. |
| 게이지 | 바이트 | The maximum memory that can be used. For heap, |
| 게이지 | 카운트 | NIO buffer pool counts. Labels: |
| 게이지 | 바이트 | Memory the JVM uses for NIO buffer pools. |
| 게이지 | 바이트 | NIO buffer pool capacity. |
JVM Garbage Collection
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 요약 | 초 | Time spent in GC pause. No quantile labels. Use |
| 요약 | 초 | Time spent in concurrent GC phase. |
| 게이지 | 바이트 | Size of long-lived heap memory pool after reclamation. The "live heap" to watch for memory pressure. |
| 게이지 | 바이트 | Max size of long-lived heap memory pool. |
| 카운터 | 바이트 | Increase in young heap pool size between GCs. |
| 카운터 | 바이트 | Promotions from young into old generation. |
System Metrics
Use system metrics to monitor host-level CPU, disk, memory, paging, and network conditions that can affect mongot.
중앙처리장치
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 카운트 | Processors available to the JVM. |
| 게이지 | 0–1 | Recent system CPU usage. |
| 게이지 | unitless | OS 1-minute load average. |
Disk
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 바이트 | Free disk space on the |
| 게이지 | 바이트 | Total disk space on the |
| 게이지 | 바이트 | Free and total disk space across the file system (different scope than |
| 게이지 | 바이트 | Bytes read from disk per device. Label: |
| 게이지 | 바이트 | Bytes written per device. |
| 게이지 | 카운트 | Read I/O count per device. Use |
| 게이지 | 카운트 | Write I/O count per device. |
| 게이지 | 카운트 | Disk queue length (I/Os in progress) per device. |
| 게이지 | 밀리초 | Time spent reading or writing per device. |
메모리
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 바이트 | Total physical memory on the host. |
| 게이지 | 바이트 | Physical memory available. |
| 게이지 | 바이트 | Physical memory in use. |
| 게이지 | 바이트 | Total physical and virtual memory in use. |
| 게이지 | 바이트 | Swap state. |
| 게이지 | 카운트 | Swap in/out activity. |
| 게이지 | 카운트 | Number of memory mappings (relevant for Lucene mmap counts). |
| 게이지 | 바이트 | System page size. |
페이지 오류
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 카운트 | Major page faults. Use this metric with the storage class advisory threshold. |
| 게이지 | 카운트 | Minor page faults. |
네트워크
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 바이트 | Bytes received and sent per interface ( |
| 게이지 | 카운트 | Packets received and sent. |
| 게이지 | 카운트 | Error, drop, and collision counters. |
| 게이지 | bits/sec | Negotiated interface speed. |
복제 메트릭
Use replication metrics to determine whether mongot is healthy, syncing normally, and staying caught up with mongod.
Overall State
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 0/1 |
|
| 게이지 | 0/1 |
|
| 카운터 | 카운트 | State transitions. Labels: |
Session Refresher
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 카운트 | Active sessions. |
| 카운터 | 카운트 | Total session refreshes. |
| 카운터 | 카운트 | Failed refreshes. |
| 요약 | 초 | Refresh duration distribution. |
Optime Updater
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 카운터 | 카운트 | Optime-update errors. |
| 다양한 | N/A | Executor metrics for the optime updater. |
Per-index Metrics
mongot emits the following metrics per index and includes generationId_logString and indexId_logString labels to identify the specific index. Filter by those labels to inspect a specific index, or aggregate across labels to understand fleet-wide behavior.
Index Status and Size
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 0/1 | Per-index status. One-hot encoded across the |
| 게이지 | 숫자 | On-disk index format version. For example, |
| 게이지 | 바이트 | Total on-disk size of the index. |
| 게이지 | 바이트 | Largest single file in the index. |
| 게이지 | 카운트 | Number of Lucene segment files. |
| 게이지 | 카운트 | Lucene documents in the index. |
| 게이지 | 카운트 | Maximum Lucene document ID (includes deleted-not-merged). |
| 게이지 | 카운트 | Number of indexed Lucene fields. |
| 게이지 | 카운트 | Number of Lucene segments. |
| 게이지 | 바이트 | Estimated required memory for the index. |
Indexing Metrics
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 밀리초 | Replication lag per index, in milliseconds. The unit is in the metric name ( |
| 게이지 | BSON Timestamp | Last applied replication optime (numeric encoding). |
| 게이지 | BSON Timestamp | Cap on advance, set by |
| 카운터 | 카운트 | Indexing operation counts. Label: |
| 카운터 | 바이트 | Total bytes processed by indexing. |
| 카운터 | 카운트 | Vector fields indexed. |
| 요약 | seconds ( | Per-index commit durations. |
| 요약 | 초 | Batch duration distribution. |
| 카운터 | 카운트 | Oversized change-stream events. Label: |
| 카운터 | 카운트 | Documents rejected for invalid geometry. |
| 카운터 | 카운트 | Truncated sortable strings. |
| 카운터 | 카운트 | Exceptions during initial sync. |
| 카운터 | 카운트 | Steady-state exceptions. |
| 게이지 | 카운트 | Consecutive initial-sync resync exceptions for this index. |
쿼리 지표
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 카운터 | 카운트 | Total queries issued against the index. |
| 카운터 | 카운트 | Total hits returned. |
| 카운터 | 카운트 | Queries that failed. |
| 카운터 | 카운트 | Specific failure-class counters. |
| 요약 | seconds ( | Search batch latency. This is the headline query-latency metric. |
| 요약 | seconds ( | Latency inside Lucene's TopDocs search. Use this metric to distinguish Lucene-internal latency from total latency. |
| 요약 | seconds ( | Vector search result latency. |
| 요약 | seconds ( | Vector search latency phases. |
| 요약 | seconds ( | Facets state-refresh latency. |
| 요약 | bytes / count | Per-batch payload size and document count. |
| histogram | count ( | Distribution of |
| histogram | 카운트 | Vector candidates per query, bucketed by quantization. |
| counter / summary | 카운트 |
|
| 카운터 | 카운트 | Result batches with score ties. |
| 카운터 | 카운트 | Queries that benefited from index sort optimization. |
| 카운터 | 카운트 | Limit-extraction optimizations triggered. |
| 카운터 | 카운트 | Phantom-searcher cleanups. |
| 요약 | ratio | Deleted-document ratio in returned results. |
| 카운터 | 카운트 | Batches that made no forward progress. |
| 카운터 | 카운트 | Vector-specific counters. |
| 카운터 | 카운트 | Per-query-feature usage. Label: |
| 카운터 | 카운트 | Failed |
Per-index Replication Breakdown
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 0/1 | The canonical replication-state signal. |
| 요약 | bytes / count | Steady-state batch sizes. |
| 요약 | 초 | Steady-state decoding duration. |
| 요약 | 초 | Steady-state |
| 요약 | — | Initial-sync change-stream phase metrics (mirrors steady state). |
| 요약 | — | Initial-sync collection-scan phase metrics. |
Lucene Refresh Latency
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 요약 | seconds ( | Lucene IndexReader refresh latency. |
Command Metrics
mongot accepts the following set of named commands from mongod:
buildinfogetMorehelloisMasterismasterkillCursorsmanageSearchIndexpingplanShardedSearchsearchvectorSearch
For each command, mongot exposes the following metrics, where <name> is a placeholder for the command name:
패턴 | 유형 | 설명 |
|---|---|---|
| 카운터 | Failure count for the command. |
| 요약 | End-to-end latency including serialization. |
| 요약 | Serialization latency (subset; not all commands). |
팁
Monitor Search and Vector Search Latency Across Indexes
mongot_command_searchCommandTotalLatency_seconds and mongot_command_vectorSearchCommandTotalLatency_seconds are the primary metrics to monitor for $search and $vectorSearch latency aggregates. These expose latency aggregates for all search and vectorSearch commands across all indexes.
Indexing Scheduler and Dispatcher Metrics
Use indexing scheduler and dispatcher metrics to identify backlog, saturation, and slow work in replication and indexing pipelines.
Steady-state Change-stream Pipeline
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 카운트 | Batches currently being applied. Label: |
| 요약 | 초 | Duration distribution for in-flight batches. |
| 요약 | 초 |
|
| 게이지 | 카운트 |
|
| 게이지 | 카운트 | Scheduled |
| 요약 | 초 |
|
| 요약 | 초 | Pre-processing duration per batch. |
| 카운터 | 카운트 | Total change-stream events observed. Label: |
| 카운터 | 카운트 | Events that |
| 게이지 | 0/1 | Dispatcher status. Labels: |
| 카운터 | 카운트 | Events skipped due to missing metadata. |
| 카운터 | 카운트 | Unexpected batch failures. |
| 카운터 | 카운트 | Rescheduled embedding getMores. This metric is only available when you configure Automated Embedding. |
| 카운터 | 카운트 | Failed change-stream mode sampling attempts. |
Indexing Work Scheduler
Indexing work scheduler metrics monitor the queueing and execution of indexing batches.
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 카운트 | Scheduler queue depth. |
| 카운터 | 카운트 | Enqueue and dequeue counts. |
| 요약 | 카운트 | Distribution of batch sizes. Label: |
| 요약 | 초 | Batch durations. |
| 요약 | 초 | Scheduling overhead. |
Decoding Work Scheduler
Decoding work scheduler metrics monitor the queueing and execution of change-stream batch decoding.
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 카운트 | Scheduler queue depth. |
| 카운터 | 카운트 | Enqueue and dequeue counts. |
| 요약 | 카운트 | Distribution of batch sizes. Label: |
| 요약 | 초 | Batch durations. |
| 요약 | 초 | Scheduling overhead. |
Initial Sync, Lifecycle, and Config Metrics
Use these metrics to track index startup work, recovery, and catalog state.
초기 동기화
참고
Some mongot metrics are phase-specific and populate only when the corresponding code path is active. For example, steady-state replication metrics, such as mongot_index_stats_indexing_replicationLagMs and the mongot_index_stats_replication_steadyState_* series, do not populate while an index is in initial sync. Conversely, initial-sync-specific metrics, such as mongot_initialsync_* and mongot_index_stats_replication_initialSync_*, are only relevant while initial sync is running or has run.
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 카운트 | Queued initial syncs. Label: |
| 카운터 | 카운트 | Embedding initial syncs that were requeued. This metric is only available when you configure Automated Embedding. |
| 게이지 | 카운트 | Initial syncs currently in progress. Label: |
| 게이지 | 카운트 | Initial syncs queued at the dispatcher. |
| 게이지 | 카운트 | In-progress syncs that resumed from a checkpoint. |
| 게이지 | 0/1 | Active collection-scan mode. Label: |
| 요약 | 초 | Completed sync duration distribution. |
| 요약 | 초 | Ongoing sync duration. |
| 게이지 | 초 | Min, max, and sum of in-progress initial sync durations. |
| 카운터 | 카운트 | Dropped because their on-disk segments could not be read. |
| 카운터 | 카운트 | Recovered after unreadable segments. Label: |
라이프사이클
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 카운트 | Indexes currently in the initialized state. |
| 요약 | 초 | Initialization durations. |
| 카운터 | 카운트 | Index downloads that failed. |
| 카운터 | 카운트 | Index drops that failed. |
| 카운터 | 카운트 | Index initializations that failed. |
Config State
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 카운트 | Indexes currently in the catalog. Labels: |
| 게이지 | 카운트 | Indexes being phased out. |
| 게이지 | 카운트 | Staged but not yet active indexes. |
| 게이지 | 카운트 | Feature-version-4-specific equivalents. |
Cursors and Index Factory
Use these metrics to monitor open cursor state and to detect indexes that mongot dropped or recovered because their on-disk segments were unreadable.
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 카운트 | Currently tracked open cursors. |
| 카운터 | 카운트 | Indexes dropped because their segments were unreadable. |
| 카운터 | 카운트 | Recoveries after unreadable segments. Label: |
Lucene Merge
Use these metrics to monitor Lucene segment merge activity, including the number and size of merges in progress, merge input and output sizes, merge durations, and merges discarded by the disk-utilization merge policy.
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 카운트 | Active merges. Label: |
| 게이지 | 카운트 | Documents currently being merged. |
| 카운터 | 카운트 | Total merges executed since startup. |
| 카운터 | 카운트 | Segments folded by merges. |
| 요약 | 바이트 | Distribution of merge input sizes. |
| 요약 | 바이트 | Distribution of merge output sizes. |
| 요약 | 카운트 | Documents-per-merge distribution. |
| 요약 | 초 | Merge duration distribution. |
| 카운터 | 카운트 | Merges discarded by the disk-utilization-aware policy. |
MongoDB Client Connection Pool Metrics
mongot opens multiple named connection pools to mongod, and labels each pool with a clientName label that identifies the role of each pool. The following table lists possible clientName label values and their corresponding role:
clientName | 목적 |
|---|---|
| Steady-state change-stream replication. |
| Initial sync and session refresh. The |
| Internal metadata service. |
| Optime polling. |
| Database metadata lookups. |
| Server-info lookups. |
| Lease manager. |
| Automated embedding writes. This connection pool only appears when you configure Automated Embedding. |
The following table lists the available metrics for mongot connection pools:
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 게이지 | 카운트 | Currently open connections in the pool. |
| 게이지 | 카운트 | Connections currently checked out. |
| 게이지 | 카운트 | Configured max pool size. |
| 게이지 | 카운트 | Configured min pool size. |
| 카운터 | 카운트 | Successful native OpenSSL link attempts. |
| 카운터 | 카운트 | Failed native OpenSSL link attempts. |
동의어
Use these metrics to monitor synonym synchronization activity, including collection scans, scan and sync durations, queue depth, and exceptions encountered during synonym sync.
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 카운터 | 카운트 | Total collection scans performed for synonyms. |
| 카운터 | 카운트 | Synonym scans triggered by change-stream events. |
| 요약 | 초 | Scan duration distribution. |
| 요약 | 초 | Sync duration distribution. |
| 게이지 | 카운트 | Current synonym sync queue depth. |
| 카운터 | 카운트 | Synonym sync exceptions. |
Executor Pools
Use these metrics to monitor the named executor pools that mongot uses to run background work. Each pool exposes the same set of sub-metrics, prefixed with the pool name, so you can track thread activity, pool sizing, queue depth, task throughput, and per-task execution time across all pools.
The following table lists the sub-metrics that every executor pool exposes, where <pool> is the pool name prefix. All executor-pool sub-metrics carry the label name="executorMetrics".
Sub-Metric Suffix | 유형 | 설명 |
|---|---|---|
| 게이지 | Threads currently executing tasks. |
| 게이지 | Pool sizing. |
| 게이지 | Tasks waiting for a thread — the saturation signal. |
| 게이지 | Remaining queue capacity. |
| 카운터 | Tasks completed since startup. |
| 요약 | Time threads spent idle between tasks. |
| 요약 | Per-task execution time. |
| 카운터 | Scheduled task counts (for scheduling pools). |
The following table lists the prefixes for all available named executor pools and their respective purposes:
Executor Pool Prefix | What it runs |
|---|---|
| Blob-store lifecycle work. |
| Blocking gRPC server worker threads. |
| Change-stream mode selection. |
| Change-stream sync dispatching (one of the busiest in steady state). |
| Config-monitor polling. |
| Decoding pipeline workers. |
| Disk-monitor polling. |
| gRPC health check timer. |
| Idle cursor reaping. |
| Index commit operations. |
| Per-index lifecycle work. |
| Lucene IndexReader refreshes. |
| Indexing pipeline workers (the busiest indexing pool in steady state). |
| Indexing-lifecycle work. |
| Automated-embedding indexing path. This executor pool only appears when you configure Automated Embedding. |
| Init-time lifecycle work. |
| Materialized-view tracking and lifecycle. These metrics are only available when Automated Embedding or other materialized-view-backed features are configured. |
| Optime updater (background). |
| Session refresher. |
| System metrics updater. |
팁
Watch Saturation Across All Executor Pools
To monitor saturation across all executor pools, run the following PromQL query:
max by (pool) ( label_replace( {__name__=~"mongot_.+_executor_queued_tasks"}, "pool", "$1", "__name__", "mongot_(.+)_executor_queued_tasks" ) )
This query returns the queued-task count for each executor pool.
Prometheus Server Self-Metrics
The following metric is available for the embedded Prometheus server in mongot:
메트릭 | 유형 | 단위 | 설명 |
|---|---|---|---|
| 요약 | seconds ( | How long |
Configuration-Specific Metrics
The following metric families appear in the /metrics output only when you enable specific features.
Metric Family | 설명 | Availability in Self-Managed mongot |
|---|---|---|
| Metrics related to Automated Embedding. For example, | Appear only when you configure Automated Embedding. |
| Failure count for the FTDC executor. | Appears only when you enable the |
PromQL Examples
Most latency metrics in this catalog are summaries, not histograms, so use their published quantile labels directly when they exist. A smaller number of metrics, such as mongot_index_stats_query_limitPerQuery and mongot_index_stats_query_numCandidatesPerQuery, are histograms and expose _bucket series.
# Replication state max by (state) (mongot_replication_mongodb_indexManagerState == 1) # Maximum replication lag across all indexes, converted to seconds max(mongot_index_stats_indexing_replicationLagMs) / 1000 # Index count by status count by (status) (mongot_index_stats_indexStatusCode == 1) # Search query p99 latency across all indexes max(mongot_index_stats_query_searchResultBatchLatencies_seconds{quantile="0.99"}) # Worst recent GC pause max(mongot_jvm_gc_pause_seconds_max) # Average GC pause over 5 minutes rate(mongot_jvm_gc_pause_seconds_sum[5m]) / rate(mongot_jvm_gc_pause_seconds_count[5m]) # Free disk percentage on dataPath mongot_system_disk_space_data_path_free_bytes / mongot_system_disk_space_data_path_total_bytes # Major page fault rate rate(mongot_system_process_majorPageFaults_operations[5m]) # Steady-state and initial sync exceptions over 15 minutes sum(rate(mongot_index_stats_indexing_steadyStateExceptions_total[15m])) sum(rate(mongot_index_stats_indexing_initialSyncExceptions_total[15m]))