The mongot process exposes Prometheus metrics that describe its runtime health and performance across core areas of operation. This reference page describes key metrics that are relevant to day-to-day monitoring and troubleshooting. For the complete metric set, scrape the mongot Prometheus metrics endpoint at http://<mongot-host>:9946/metrics.
Ver métricas
To view the raw metrics that mongot exposes, send an HTTP GET request to the following mongot Prometheus metrics endpoint:
http://<mongot-host>:9946/metrics
In this endpoint:
<mongot-host>is the hostname or IP address of themongotprocess.9946is the default port for the metrics endpoint. To configure the metrics endpoint port, see themetrics.addresssetting in themongotconfiguration file./metricsis the path for the metrics endpoint.
The /metrics endpoint returns metrics in plain Prometheus text format. To monitor mongot metrics over time, configure your Prometheus instance to scrape this endpoint.
Importante
The /metrics endpoint requires no authentication by default. For production deployments, restrict access at the network layer.
Metric Naming Conventions
mongot metric names use a consistent naming pattern:
All metric names start with the
mongot_prefix.Metric names generally follow the pattern
mongot_<area>_<measurement>[_<unit>], where:<area>indicates the subsystem or component the metric belongs to, such asprocess,jvm,replication, orindex.<measurement>indicates what is being measured, such ascpu_usage,heap_memory, orindex_size.<unit>(optional) indicates the unit or counter semantics for the metric. This optional suffix indicates either the unit that the metric is measured in, such asseconds,bytes, orms, or the type of counter the metric represents, such astotal,events, oroperations.Observação
Some metric name suffixes don't reflect the actual reported unit for the metric. For example,
mongot_index_stats_query_latency_secondshas the suffix_seconds, butmongotreports the metric in milliseconds, as indicated by thetimeUnit=millisecondslabel in the metric output. To confirm the unit for each metric, check the Unit value in the metric reference tables below.
In addition to the metric name, mongot metrics can include labels (also called dimensions). Labels distinguish multiple time series that share the same base metric name. For example, a metric might use labels to identify a state, status, index type, quantile, or a specific index.
For some metrics, you must interpret the metric as the combination of the metric name and its labels, not by the metric name alone. For example, mongot_replication_mongodb_indexManagerState uses the state label to expose one series for each replication state, such as STEADY_STATE or FAILED. Exactly one of those labeled series has the value 1 at a time. Per-index metrics similarly use labels such as generationId_logString and indexId_logString to distinguish one index from another.
For distribution metrics, the suffix of the metric name indicates the Prometheus series type:
Histograms expose
_bucket,_count,_sum, and_max.Summaries expose
_count,_sum, and_max. Some summaries also include quantile labels such as{quantile="0.5"}.
Common Metric Labels
The following table describes common labels that appear in mongot metrics.
Label Name | Metric Scope | Valores possíveis |
|---|---|---|
| All executor pools |
|
| Cross-cutting |
|
| Most | Internal opaque Ids (the per-index identifier that the logs use) |
| Many index metrics |
|
| Indexing and initial-sync metrics |
|
| Index size and document metrics |
|
| Latency summary metrics |
|
| Summary metrics |
|
|
|
|
|
|
|
|
|
|
Key Metric Groups
Process and JVM Metrics
Use process and JVM metrics to confirm that mongot is running normally and to identify heap or garbage collection pressure.
Processo
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | Segundos | The uptime of the Java Virtual Machine. |
| Medidor | unix seconds | Start time of the process since unix epoch. |
| Contador | nanossegundos | The "cpu time" used by the Java Virtual Machine process. Use |
| Medidor | 0-1 | The "recent cpu usage" for the Java Virtual Machine process. |
JVM Memory
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | bytes | The amount of used memory. Labels: |
| Medidor | bytes | The amount of memory committed for the Java virtual machine to use. |
| Medidor | bytes | The maximum memory that can be used. For heap, |
| Medidor | contar | NIO buffer pool counts. Labels: |
| Medidor | bytes | Memory the JVM uses for NIO buffer pools. |
| Medidor | bytes | NIO buffer pool capacity. |
JVM Garbage Collection
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Resumo | Segundos | Time spent in GC pause. No quantile labels. Use |
| Resumo | Segundos | Time spent in concurrent GC phase. |
| Medidor | bytes | Size of long-lived heap memory pool after reclamation. The "live heap" to watch for memory pressure. |
| Medidor | bytes | Max size of long-lived heap memory pool. |
| Contador | bytes | Increase in young heap pool size between GCs. |
| Contador | bytes | Promotions from young into old generation. |
System Metrics
Use system metrics to monitor host-level CPU, disk, memory, paging, and network conditions that can affect mongot.
CPU
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | contar | Processors available to the JVM. |
| Medidor | 0–1 | Recent system CPU usage. |
| Medidor | sem unidade | OS 1-minute load average. |
Disk
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | bytes | Free disk space on the |
| Medidor | bytes | Total disk space on the |
| Medidor | bytes | Free and total disk space across the file system (different scope than |
| Medidor | bytes | Bytes read from disk per device. Label: |
| Medidor | bytes | Bytes written per device. |
| Medidor | contar | Read I/O count per device. Use |
| Medidor | contar | Write I/O count per device. |
| Medidor | contar | Disk queue length (I/Os in progress) per device. |
| Medidor | Milissegundos | Time spent reading or writing per device. |
Memória
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | bytes | Total physical memory on the host. |
| Medidor | bytes | Physical memory available. |
| Medidor | bytes | Physical memory in use. |
| Medidor | bytes | Total physical and virtual memory in use. |
| Medidor | bytes | Swap state. |
| Medidor | contar | Swap in/out activity. |
| Medidor | contar | Number of memory mappings (relevant for Lucene mmap counts). |
| Medidor | bytes | System page size. |
Falhas na página
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | contar | Major page faults. Use this metric with the storage class advisory threshold. |
| Medidor | contar | Minor page faults. |
Rede
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | bytes | Bytes received and sent per interface ( |
| Medidor | contar | Packets received and sent. |
| Medidor | contar | Error, drop, and collision counters. |
| Medidor | bits/sec | Negotiated interface speed. |
Métricas de replicação
Use replication metrics to determine whether mongot is healthy, syncing normally, and staying caught up with mongod.
Overall State
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | 0/1 |
|
| Medidor | 0/1 |
|
| Contador | contar | State transitions. Labels: |
Session Refresher
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | contar | Active sessions. |
| Contador | contar | Total session refreshes. |
| Contador | contar | Failed refreshes. |
| Resumo | Segundos | Refresh duration distribution. |
Optime Updater
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Contador | contar | Optime-update errors. |
| vários | N/A | Executor metrics for the optime updater. |
Per-index Metrics
mongot emits the following metrics per index and includes generationId_logString and indexId_logString labels to identify the specific index. Filter by those labels to inspect a specific index, or aggregate across labels to understand fleet-wide behavior.
Index Status and Size
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | 0/1 | Per-index status. One-hot encoded across the |
| Medidor | número | On-disk index format version. For example, |
| Medidor | bytes | Total on-disk size of the index. |
| Medidor | bytes | Largest single file in the index. |
| Medidor | contar | Number of Lucene segment files. |
| Medidor | contar | Lucene documents in the index. |
| Medidor | contar | Maximum Lucene document ID (includes deleted-not-merged). |
| Medidor | contar | Number of indexed Lucene fields. |
| Medidor | contar | Number of Lucene segments. |
| Medidor | bytes | Estimated required memory for the index. |
Indexing Metrics
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | Milissegundos | Replication lag per index, in milliseconds. The unit is in the metric name ( |
| Medidor | BSON Timestamp | Last applied replication optime (numeric encoding). |
| Medidor | BSON Timestamp | Cap on advance, set by |
| Contador | contar | Indexing operation counts. Label: |
| Contador | bytes | Total bytes processed by indexing. |
| Contador | contar | Vector fields indexed. |
| Resumo | seconds ( | Per-index commit durations. |
| Resumo | Segundos | Batch duration distribution. |
| Contador | contar | Oversized change-stream events. Label: |
| Contador | contar | Documents rejected for invalid geometry. |
| Contador | contar | Truncated sortable strings. |
| Contador | contar | Exceptions during initial sync. |
| Contador | contar | Steady-state exceptions. |
| Medidor | contar | Consecutive initial-sync resync exceptions for this index. |
Métricas de query
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Contador | contar | Total queries issued against the index. |
| Contador | contar | Total hits returned. |
| Contador | contar | Queries that failed. |
| Contador | contar | Specific failure-class counters. |
| Resumo | seconds ( | Search batch latency. This is the headline query-latency metric. |
| Resumo | seconds ( | Latency inside Lucene's TopDocs search. Use this metric to distinguish Lucene-internal latency from total latency. |
| Resumo | seconds ( | Vector search result latency. |
| Resumo | seconds ( | Vector search latency phases. |
| Resumo | seconds ( | Facets state-refresh latency. |
| Resumo | bytes / count | Per-batch payload size and document count. |
| histogram | count ( | Distribution of |
| histogram | contar | Vector candidates per query, bucketed by quantization. |
| counter / summary | contar |
|
| Contador | contar | Result batches with score ties. |
| Contador | contar | Queries that benefited from index sort optimization. |
| Contador | contar | Limit-extraction optimizations triggered. |
| Contador | contar | Phantom-searcher cleanups. |
| Resumo | ratio | Deleted-document ratio in returned results. |
| Contador | contar | Batches that made no forward progress. |
| Contador | contar | Vector-specific counters. |
| Contador | contar | Per-query-feature usage. Label: |
| Contador | contar | Failed |
Per-index Replication Breakdown
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | 0/1 | The canonical replication-state signal. |
| Resumo | bytes / count | Steady-state batch sizes. |
| Resumo | Segundos | Steady-state decoding duration. |
| Resumo | Segundos | Steady-state |
| Resumo | — | Initial-sync change-stream phase metrics (mirrors steady state). |
| Resumo | — | Initial-sync collection-scan phase metrics. |
Lucene Refresh Latency
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Resumo | seconds ( | Lucene IndexReader refresh latency. |
Command Metrics
mongot accepts the following set of named commands from mongod:
buildinfogetMorehelloisMasterismasterkillCursorsmanageSearchIndexpingplanShardedSearchsearchvectorSearch
For each command, mongot exposes the following metrics, where <name> is a placeholder for the command name:
Padrão | Tipo | Descrição |
|---|---|---|
| Contador | Failure count for the command. |
| Resumo | End-to-end latency including serialization. |
| Resumo | Serialization latency (subset; not all commands). |
Dica
Monitor Search and Vector Search Latency Across Indexes
mongot_command_searchCommandTotalLatency_seconds and mongot_command_vectorSearchCommandTotalLatency_seconds are the primary metrics to monitor for $search and $vectorSearch latency aggregates. These expose latency aggregates for all search and vectorSearch commands across all indexes.
Indexing Scheduler and Dispatcher Metrics
Use indexing scheduler and dispatcher metrics to identify backlog, saturation, and slow work in replication and indexing pipelines.
Steady-state Change-stream Pipeline
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | contar | Batches currently being applied. Label: |
| Resumo | Segundos | Duration distribution for in-flight batches. |
| Resumo | Segundos |
|
| Medidor | contar |
|
| Medidor | contar | Scheduled |
| Resumo | Segundos |
|
| Resumo | Segundos | Pre-processing duration per batch. |
| Contador | contar | Total change-stream events observed. Label: |
| Contador | contar | Events that |
| Medidor | 0/1 | Dispatcher status. Labels: |
| Contador | contar | Events skipped due to missing metadata. |
| Contador | contar | Unexpected batch failures. |
| Contador | contar | Rescheduled embedding getMores. This metric is only available when you configure Automated Embedding. |
| Contador | contar | Failed change-stream mode sampling attempts. |
Indexing Work Scheduler
Indexing work scheduler metrics monitor the queueing and execution of indexing batches.
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | contar | Scheduler queue depth. |
| Contador | contar | Enqueue and dequeue counts. |
| Resumo | contar | Distribution of batch sizes. Label: |
| Resumo | Segundos | Batch durations. |
| Resumo | Segundos | Scheduling overhead. |
Decoding Work Scheduler
Decoding work scheduler metrics monitor the queueing and execution of change-stream batch decoding.
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | contar | Scheduler queue depth. |
| Contador | contar | Enqueue and dequeue counts. |
| Resumo | contar | Distribution of batch sizes. Label: |
| Resumo | Segundos | Batch durations. |
| Resumo | Segundos | Scheduling overhead. |
Initial Sync, Lifecycle, and Config Metrics
Use these metrics to track index startup work, recovery, and catalog state.
Sincronização inicial
Observação
Some mongot metrics are phase-specific and populate only when the corresponding code path is active. For example, steady-state replication metrics, such as mongot_index_stats_indexing_replicationLagMs and the mongot_index_stats_replication_steadyState_* series, do not populate while an index is in initial sync. Conversely, initial-sync-specific metrics, such as mongot_initialsync_* and mongot_index_stats_replication_initialSync_*, are only relevant while initial sync is running or has run.
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | contar | Queued initial syncs. Label: |
| Contador | contar | Embedding initial syncs that were requeued. This metric is only available when you configure Automated Embedding. |
| Medidor | contar | Initial syncs currently in progress. Label: |
| Medidor | contar | Initial syncs queued at the dispatcher. |
| Medidor | contar | In-progress syncs that resumed from a checkpoint. |
| Medidor | 0/1 | Active collection-scan mode. Label: |
| Resumo | Segundos | Completed sync duration distribution. |
| Resumo | Segundos | Ongoing sync duration. |
| Medidor | Segundos | Min, max, and sum of in-progress initial sync durations. |
| Contador | contar | Dropped because their on-disk segments could not be read. |
| Contador | contar | Recovered after unreadable segments. Label: |
Ciclo de vida
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | contar | Indexes currently in the initialized state. |
| Resumo | Segundos | Initialization durations. |
| Contador | contar | Index downloads that failed. |
| Contador | contar | Index drops that failed. |
| Contador | contar | Index initializations that failed. |
Config State
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | contar | Indexes currently in the catalog. Labels: |
| Medidor | contar | Indexes being phased out. |
| Medidor | contar | Staged but not yet active indexes. |
| Medidor | contar | Feature-version-4-specific equivalents. |
Cursors and Index Factory
Use these metrics to monitor open cursor state and to detect indexes that mongot dropped or recovered because their on-disk segments were unreadable.
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | contar | Currently tracked open cursors. |
| Contador | contar | Indexes dropped because their segments were unreadable. |
| Contador | contar | Recoveries after unreadable segments. Label: |
Lucene Merge
Use these metrics to monitor Lucene segment merge activity, including the number and size of merges in progress, merge input and output sizes, merge durations, and merges discarded by the disk-utilization merge policy.
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | contar | Active merges. Label: |
| Medidor | contar | Documents currently being merged. |
| Contador | contar | Total merges executed since startup. |
| Contador | contar | Segments folded by merges. |
| Resumo | bytes | Distribution of merge input sizes. |
| Resumo | bytes | Distribution of merge output sizes. |
| Resumo | contar | Documents-per-merge distribution. |
| Resumo | Segundos | Merge duration distribution. |
| Contador | contar | Merges discarded by the disk-utilization-aware policy. |
MongoDB Client Connection Pool Metrics
mongot opens multiple named connection pools to mongod, and labels each pool with a clientName label that identifies the role of each pool. The following table lists possible clientName label values and their corresponding role:
clientName | Propósito |
|---|---|
| Steady-state change-stream replication. |
| Initial sync and session refresh. The |
| Internal metadata service. |
| Optime polling. |
| Database metadata lookups. |
| Server-info lookups. |
| Lease manager. |
| Automated embedding writes. This connection pool only appears when you configure Automated Embedding. |
The following table lists the available metrics for mongot connection pools:
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Medidor | contar | Currently open connections in the pool. |
| Medidor | contar | Connections currently checked out. |
| Medidor | contar | Configured max pool size. |
| Medidor | contar | Configured min pool size. |
| Contador | contar | Successful native OpenSSL link attempts. |
| Contador | contar | Failed native OpenSSL link attempts. |
Sinônimos
Use these metrics to monitor synonym synchronization activity, including collection scans, scan and sync durations, queue depth, and exceptions encountered during synonym sync.
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Contador | contar | Total collection scans performed for synonyms. |
| Contador | contar | Synonym scans triggered by change-stream events. |
| Resumo | Segundos | Scan duration distribution. |
| Resumo | Segundos | Sync duration distribution. |
| Medidor | contar | Current synonym sync queue depth. |
| Contador | contar | Synonym sync exceptions. |
Executor Pools
Use these metrics to monitor the named executor pools that mongot uses to run background work. Each pool exposes the same set of sub-metrics, prefixed with the pool name, so you can track thread activity, pool sizing, queue depth, task throughput, and per-task execution time across all pools.
The following table lists the sub-metrics that every executor pool exposes, where <pool> is the pool name prefix. All executor-pool sub-metrics carry the label name="executorMetrics".
Sub-Metric Suffix | Tipo | Descrição |
|---|---|---|
| Medidor | Threads currently executing tasks. |
| Medidor | Pool sizing. |
| Medidor | Tasks waiting for a thread — the saturation signal. |
| Medidor | Remaining queue capacity. |
| Contador | Tasks completed since startup. |
| Resumo | Time threads spent idle between tasks. |
| Resumo | Per-task execution time. |
| Contador | Scheduled task counts (for scheduling pools). |
The following table lists the prefixes for all available named executor pools and their respective purposes:
Executor Pool Prefix | What it runs |
|---|---|
| Blob-store lifecycle work. |
| Blocking gRPC server worker threads. |
| Change-stream mode selection. |
| Change-stream sync dispatching (one of the busiest in steady state). |
| Config-monitor polling. |
| Decoding pipeline workers. |
| Disk-monitor polling. |
| gRPC health check timer. |
| Idle cursor reaping. |
| Index commit operations. |
| Per-index lifecycle work. |
| Lucene IndexReader refreshes. |
| Indexing pipeline workers (the busiest indexing pool in steady state). |
| Indexing-lifecycle work. |
| Automated-embedding indexing path. This executor pool only appears when you configure Automated Embedding. |
| Init-time lifecycle work. |
| Materialized-view tracking and lifecycle. These metrics are only available when Automated Embedding or other materialized-view-backed features are configured. |
| Optime updater (background). |
| Session refresher. |
| System metrics updater. |
Dica
Watch Saturation Across All Executor Pools
To monitor saturation across all executor pools, run the following PromQL query:
max by (pool) ( label_replace( {__name__=~"mongot_.+_executor_queued_tasks"}, "pool", "$1", "__name__", "mongot_(.+)_executor_queued_tasks" ) )
This query returns the queued-task count for each executor pool.
Prometheus Server Self-Metrics
The following metric is available for the embedded Prometheus server in mongot:
Métrica | Tipo | unidade | Descrição |
|---|---|---|---|
| Resumo | seconds ( | How long |
Configuration-Specific Metrics
The following metric families appear in the /metrics output only when you enable specific features.
Metric Family | Descrição | Availability in Self-Managed mongot |
|---|---|---|
| Metrics related to Automated Embedding. For example, | Appear only when you configure Automated Embedding. |
| Failure count for the FTDC executor. | Appears only when you enable the |
PromQL Examples
Most latency metrics in this catalog are summaries, not histograms, so use their published quantile labels directly when they exist. A smaller number of metrics, such as mongot_index_stats_query_limitPerQuery and mongot_index_stats_query_numCandidatesPerQuery, are histograms and expose _bucket series.
# Replication state max by (state) (mongot_replication_mongodb_indexManagerState == 1) # Maximum replication lag across all indexes, converted to seconds max(mongot_index_stats_indexing_replicationLagMs) / 1000 # Index count by status count by (status) (mongot_index_stats_indexStatusCode == 1) # Search query p99 latency across all indexes max(mongot_index_stats_query_searchResultBatchLatencies_seconds{quantile="0.99"}) # Worst recent GC pause max(mongot_jvm_gc_pause_seconds_max) # Average GC pause over 5 minutes rate(mongot_jvm_gc_pause_seconds_sum[5m]) / rate(mongot_jvm_gc_pause_seconds_count[5m]) # Free disk percentage on dataPath mongot_system_disk_space_data_path_free_bytes / mongot_system_disk_space_data_path_total_bytes # Major page fault rate rate(mongot_system_process_majorPageFaults_operations[5m]) # Steady-state and initial sync exceptions over 15 minutes sum(rate(mongot_index_stats_indexing_steadyStateExceptions_total[15m])) sum(rate(mongot_index_stats_indexing_initialSyncExceptions_total[15m]))