Navigation
You were redirected from a different version of the documentation. Click here to go back.

Integrate with Prometheus

Prometheus collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when it observes specific conditions.

Our integration allows you to configure Cloud Manager to send metric data about your deployment to your Prometheus instance.

Prerequisites

  • Prometheus integration is available in automation managed clusters that use MongoDB Agent 12.0.15.7646 or later. MongoDB Agent 12.0.15.7646 is released with Cloud Manager 6.0.7.
  • Have a working Prometheus instance. To set up a working instance, see their Installation Guide.
  • (Optional) Use Grafana to visualize your Prometheus metrics.

Procedure

To integrate Cloud Manager with Prometheus:

1
2

Click Configure for the Prometheus integration card.

3

Enter your preferred username and password.

Important

Copy your username and password in a secure location. You can’t access the password after you leave this screen.

4

Enter your IP address and port.

Tip

The default value, 0.0.0.0:9216, scrapes metrics on port 9216 on all IPv4 addresses on the local machine.

5

(Optional) Encrypt all Prometheus metrics.

If you enable this setting, Cloud Manager assures that your Prometheus instance uses https to scrape metrics.

Fields Description
TLS Certificate Key File Path

PEM file path that contains certificate and key required to spin up a https Prometheus scraping endpoint.

Note

You are responsible for the following:

  • TLS Certificate Key File issuance and renewal.
  • Checking if the endpoint started correctly in the automation agent logs.
TLS Certificate Key File Password Required if the certificate key file is encrypted.
6

Select your preferred service discovery method.

Discovery Method Description
HTTP SD This method requires Prometheus v2.28 and later. It generates the scrape_config part of your configuration file to discover targets over an HTTP endpoint.
File Service Discovery

This method allows Prometheus to read YAML or JSON documents to configure the targets to scrape from.

You are responsible for providing the targets by making a request to the Discovery API and storing its results in a targets.json file.

To make the request, substitute the placeholder text in one of the following tabs or create your own script in another language.

# Sets the `Authorization` header on every scrape
# request with the username and password from the
# previous step. The URL that Prometheus fetches the
# targets from.
# Replace the <group-id> with the project ID of your
# Atlas instance.

curl --header 'Accept: application/json' \
     --user <username>:<password> \
     --request GET "https://cloud.mongodb.com/prometheus/v1.0/groups/{GROUP-ID}/discovery"

Tip

If you need to install the requests library, see their Installation Guide.

import time, json, requests

# This script sets the `Authorization` header on every
# scrape request with the configured username and
# password. Then it tells Prometheus to fetch targets
# from the specified URL.
#
# Note: Replace the <username> and <password> with the
# values in the previous step, and <group-id> with the
# project ID of your Atlas instance.

basic_auth_user="<username>"
basic_auth_password="<password>"
discovery_api_url="https://cloud.mongodb.com/prometheus/v1.0/groups/{GROUP-ID}/discovery"

# The script updates your targets.json file every
# minute, if it successfully retrieves targets.
#
# Note: Replace the <path-to-targets.json> with the
# path to your targets.json file.

starttime = time.time()
while True:
  r = requests.get(discovery_api_url, auth=(basic_auth_user, basic_auth_password))
  if  r.status_code == 200:
    with open('<path-to-targets.json>', 'w') as f:
      json.dump(r.json(), f)
  time.sleep(60.0 - ((time.time() - starttime) % 60.0))

To learn more about the Discovery API, see Return the Latest Targets for Prometheus.

7

Click Save.

8

View Your Cluster Metrics on Prometheus.

  1. Copy the generated snippet into the scrape_configs section of your configuration file and substitute the placeholder text.

    For an example of the configuration file in either method, see Example Configurations.

  2. Restart your Prometheus instance.

  3. In your Prometheus instance, click Status in the top navigation bar, and click Targets to see the metrics of your deployment.

Example Configurations

The following shows examples of the configuration file when you use the HTTP Service Discovery or File Service Discovery method.

The configuration file in both methods contains the following fields:

Field Description
scrape_interval Time that indicates how frequently to scrape targets. This setting supports a minimum time of 10s.
job_name Human-readable label assigned to scraped metrics.
metrics_path HTTP resource path that indicates where to fetch metrics from targets.
scheme Your Prometheus protocol scheme configured for requests, either http or https. If you configure https, you must specify tlsPemPath.
basic_auth Authorization header to use on every scrape request.

HTTP Service Discovery

The HTTP Service Discovery method also contains the http_sd_configs field with the following sub-fields:

Field Description
url URL from which Prometheus fetches the targets.
refresh_interval Time that indicates when to re-query the endpoint.
basic_auth Credentials to use for authenticating to the API server.
global:
  scrape_interval: 15s

scrape_configs:

  - job_name: "CM-Testing-mongo-metrics"
    scrape_interval: 10s
    metrics_path: /metrics
    scheme : https
    basic_auth:
      username: prom_user_61e6e34e93eac1632d39f457
      password: V7hTyLfkjwiWQbv
    http_sd_configs:
      - url: https://cloud.mongodb.com/prometheus/v1.0/groups/61e6e34e93eac1632d39f457/discovery
      refresh_interval: 60s
      basic_auth:
        username: prom_user_61e6e34e93eac1632d39f457
        password: V7hTyLfkjwiWQbv

File Service Discovery

The File Service Discovery method also contains the file_sd_configs field with the following sub-field:

Field Description
files List that contains the files from which to extract the metrics scraping targets.
global:
  scrape_interval: 15s

scrape_configs:

  - job_name: "CM-Testing-mongo-metrics"
    scrape_interval: 10s
    metrics_path: /metrics
    scheme : https
    basic_auth:
      username: prom_user_61e6e34e93eac1632d39f457
      password: V7hTyLfkjwiWQbv
    file_sd_configs:
      - files:
        - /usr/local/etc/targets.json

Performance Metrics Available to Prometheus

The following metrics are available when you use the Prometheus integration with your MongoDB Atlas cluster:

MongoDB Metric Labels

Each MongoDB metric contains the following labels:

Label Description
group_id Unique hexadecimal digit string that identifies the project.
org_id Unique hexadecimal digit string that identifies the organization.
cl_role Human readable label that defines the cluster role.
cl_name Human-readable label that identifies the cluster.
rs_nm Human-readable label that identifies the replica set.
rs_state Number that indicates the replica set state.
process_port Port on which the process runs.

MongoDB Information Metrics

mongodb_info is a gauge that always has the value of 1. This metric contains all the MongoDB Metric Labels and also the following labels:

Label Description
mongodb_version String that represents the major, minor, and patch versions.
replica_state_name String that indicates the replica set member status.
process_type String that indicates the process running. Its values can be mongod, mongos, or config.

Hardware Metrics

Note

You can also view descriptions of each hardware metric in the Prometheus expression browser.

Name Operating System Type Description
hardware_system_cpu_nice Unix, Darwin Counter Time spent in user mode with low priority.
hardware_system_cpu_io_wait Unix Counter Time waiting for I/O to complete.
hardware_system_cpu_irq Unix Counter Time spent servicing interrupts.
hardware_system_cpu_soft_irq Unix Counter Time spent servicing softirq’s.
hardware_system_cpu_steal Unix Counter Time spent in other operating systems when running in a virtual environment.
hardware_system_cpu_guest Unix Counter Time spent running a virtual CPU for the guest operating systems under the control of the Linux kernel.
hardware_system_cpu_guest_nice Unix Counter Time spent running a guest with an adjusted niceness.
hardware_system_cpu_kernel_milliseconds All Counter Time spent in system mode.
hardware_system_cpu_user_milliseconds All Counter Time spent in user mode.
hardware_disk_metrics_weighted_time_io Unix Counter Weighted time spent doing I/O’s.
hardware_disk_metrics_physical_write_count Unix Counter Number of physical write I/O’s processed.
hardware_disk_metrics_physical_read_count Unix Counter Number of physical read I/O’s processed.
hardware_disk_metrics_total_time Unix Counter Total time this block device is active.
hardware_disk_metrics_idle_time Windows Counter Time spent in the idle task.
hardware_disk_metrics_disk_space_free_bytes All Gauge Disk space available in the mounted file system.
hardware_disk_metrics_disk_space_used_bytes All Gauge Disk space used in the mounted file system.
hardware_disk_metrics_read_count All Counter Number of read I/O’s processed.
hardware_disk_metrics_read_time_milliseconds All Counter Total wait time for read requests.
hardware_disk_metrics_write_count All Counter Number of write I/O’s processed.
hardware_disk_metrics_write_time_milliseconds All Counter Total wait time for write requests.
hardware_process_cpu_children_user Unix Counter Amount of time scheduled in user mode for this process to wait for children.
hardware_process_cpu_children_kernel Unix Counter Amount of time scheduled in kernel mode for this process to wait for children.
hardware_process_cpu_kernel_milliseconds All Counter Amount of time scheduled in kernel mode for this process.
hardware_process_cpu_user_milliseconds All Counter Amount of time scheduled in user mode for this process.
hardware_system_vm_page_swap_in Unix Counter Number of pages the system has swapped in from disk.
hardware_system_vm_page_swap_out Unix Counter Number of pages the system has swapped out to disk.
hardware_system_memory_mem_total Unix Gauge Total usable RAM (physical RAM minus a few reserved bits and the kernel binary code).
hardware_system_memory_mem_free Unix Gauge Sum of LowFree + HighFree.
hardware_system_memory_mem_available Unix Gauge An estimate of how much memory is available for starting new applications, without swapping.
hardware_system_memory_buffers Unix Gauge Temporary storage for raw disk blocks that shouldn’t get tremendously large.
hardware_system_memory_cached Unix Gauge In-memory cache for files read from the disk. This doesn’t include SwapCached.
hardware_system_memory_swap_total Unix Gauge Total amount of swap space available.
hardware_system_memory_swap_free Unix Gauge Total amount of swap space unused.
hardware_system_memory_shared_mem Unix Gauge Amount of memory consumed in file systems whose contents reside in virtual memory.
hardware_system_memory_swap_free_kilobytes All Gauge Total amount of swap space unused.
hardware_system_memory_swap_total_kilobytes All Gauge Total amount of swap space available.
hardware_platform_num_logical_cpus All Gauge Number of logical CPUs usable by the current process.
hardware_system_network_eth0_bytes_in_bytes All Counter Number of bytes of data received by the interface.
hardware_system_network_eth0_bytes_out_bytes All Counter Number of bytes of data transmitted by the interface.
hardware_system_network_lo_bytes_in_bytes All Counter Number of bytes of data received by the interface.
hardware_system_network_lo_bytes_out_bytes All Counter Number of bytes of data transmitted by the interface.

Hardware Metric Labels

Each hardware metric contains the following labels:

Label Description
group_id Unique hexadecimal digit string that identifies the project.
org_id Unique hexadecimal digit string that identifies the organization.
process_port Port on which the process runs.
disk_name Human-readable label that identifies the disk.