Peek at your MongoDB Clusters like a Pro with Keyhole: Part 1

Ken Chen

Performance evaluation and tuning is an important part of the application development life cycle. An unresponsive application experience can be a combination of inadequate provisioning of resources and poorly tuned software. When experiencing sluggish application performance, many are often quick to blame the database, but few can identify the issues.

I created Keyhole, available as an open-source project, with the goal of having a tool to quickly collect statistics from a MongoDB cluster and to produce performance analytics summaries in a few minutes. There are a few open-source tools available to address certain interests, but nothing is as complete as what keyhole provides for performance evaluation. Using keyhole is like having a CAT scan done on a MongoDB cluster. The information includes MongoDB configurations, cluster statistics, database schema, indexes, and index usages. You can also identify if your performance issues related to short of hardware resources (such as physical RAM, CPU, and disk IOPS), and/or slow queries without proper indexes.

In part 1 of this blog post you will learn how to:

You can view the complete keyhole codebase in this GitHub repository.

Download keyhole

Keyhole is written in Go and pre-compiled executables are available to download for Linux, Windows, and macOS. Download instructions are detailed in this GitHub link. To verify your keyhole installation, use:

keyhole --version

You can also find all keyhole usages with the below command:

keyhole --help

New Installation Validation

Let’s begin with validating a new MongoDB installation. After installing and configuring a MongoDB cluster, you will usually connect to the cluster and perform sanity checks. Keyhole accepts a mongo connection string to connect to your cluster. The example below prints out many cluster configurations and metrics:

keyhole --info "mongodb://user:secret@host.local/test?replicaSet=rs"

If you leave out the password field from the connection string, keyhole will prompt for password. This feature comes handy when doing a demo without sharing your password to a broad audience. For example:

keyhole --info "mongodb://user@host.local/test?replicaSet=rs"
Enter Password:

Load Tests

To evaluate MongoDB performance isolated from the application, I use keyhole to perform load tests to collect a few metrics from new installations.

Load Tests with Random Data

A Keyhole load test, by default, runs for a period of 5 minutes with three different stages: initialization, thrashing, and teardown.

  • Initialization: inserts documents into a mongo cluster at a consistent rate to measure cluster throughputs. This also populates data in memory and wiredTiger cache.
  • Thrashing: a combination of CRUD operations and aggregation queries to measure database operation times.
  • Teardown: document deletion in batches.

To begin a load test, execute the keyhole command with a connection string. There are many flags to fit your configuration for load tests:

  • --duration: defines the length of test in minutes
  • --conn: defines the number of concurrent connections
  • --tps: defines transactions per second
  • --tx: defines customized transactions in a file

Additional details about load tests are available from the load test section of the wiki. I usually run two load tests: one executed on the primary node of a replica set (or on the server hosting mongos) and the other on one of the application servers. From the results, we can compare the overhead of network latency. Below are sample outputs:

keyhole "mongodb://user@host.local/test?replicaSet=rs"
2019/11/25 09:10:39 Duration in minute(s): 5
2019/11/25 09:10:42 Total TPS: 300 (tps) * 10 (conns) = 3000, duration: 5 (mins)
...
2019/11/25 09:11:13 [replset] Storage: 665.3 -> 982.2, rate: 31.7 MB/sec
2019/11/25 09:11:14 [replset] replication lags:  - localhost:27018: 7 - localhost:27019: 3
2019/11/25 09:11:14 [replset] Memory - resident: 776, virtual: 6240, page faults: 0, iops: 1815.1
2019/11/25 09:11:14 [replset] CRUD+  - insert: 361900, find: 1, update: 58, delete: 0, getmore: 775, command: 818
2019/11/25 09:11:14 [replset] Latency- read: 0.5, write: 2.5, command: 0.5 (ms)
...
2019/11/25 09:12:43 Average Executions Time (including network latency):
2019/11/25 09:12:43 	[        Ping] 1.049222ms
2019/11/25 09:12:43 	[  InsertMany] 6.742749ms
2019/11/25 09:12:43 	[     FindOne] 3.091258ms
2019/11/25 09:12:43 	[        Find] 3.231524ms
2019/11/25 09:12:43 	[   UpdateOne] 3.894997ms
2019/11/25 09:12:43 	[  UpdateMany] 4.143732ms
2019/11/25 09:12:43 	[   DeleteOne] 3.797362ms
2019/11/25 09:12:43 	[  DeleteMany] 4.94698ms
...
<server status summaries>
...
2019/11/25 09:15:44 stats written to ./keyhole_stats.2019-11-25T091039-replset.gz

From the outputs, you have an idea of your cluster write throughput (31.7 MB/sec from the 5th line of above sample outputs), read/write latencies, and replication lags under a stressed situation. At the end of a test, summaries are printed ( from the above output) and saved to a file. Keep the file for future reference. Keyhole is able to read back the file with --diag flag and provide a text summary. It can also provide a data feed to Grafana with an additional --web flag. I will go into more detail in Part 2 of this blog which discusses integrating with a Grafana instance to better explain the summaries.

Load Tests With Customer Data

By default, keyhole generates random documents before sending them to a mongo server. For a more realistic test using customer’s document structure, include --file flag and keyhole will generate randomized documents with the same data types of all fields from the provided JSON document. You will first create a file containing a JSON document, for example, a users.json file from namespace example.users, by using the following command:

mongo "mongodb://user:xxx@host.local/example?replicaSet=rs&authSource=admin" \
    --eval 'db.users.findOne()' > users.json

Then, execute your load test as:

keyhole --file users.json "mongodb://user@host.local/test?replicaSet=rs"

Existing Cluster Health Check

For an existing mongo cluster, Keyhole collects additional information to assess the cluster performance.

Detailed Cluster Info

Keyhole collects additional configurations and stats with --info and -v flags. A JSON document containing all collected info is written to a gzipped file.

keyhole -v --info "mongodb://user@host.local/test?replicaSet=rs"
JSON is written to host.local.json.gz

Ungzip the file and view the file in a browser. You will find the information below:

  • cluster: standalone, replica or sharded
  • config: cluster configurations details
    • buildInfo
    • getCmdLineOpts
    • hostInfo
    • replSetGetStatus
    • rolesInfo
    • serverStatus
    • usersInfo
  • databases:
    • DB: database name
    • collections: an array of collection information
      • NS: namespace
      • collection: collection name
      • document: a sample document
      • indexes: all indexes of the collection and duplicate index removal recommendations
    • stats: storage stats
  • host: hostname
  • process: connected process: mongod or mongos
  • sharding: sharding info if applicable
  • storage
    • databases: an array of database storage stats
    • totalDataSize (MB): total data size,
    • totalIndexSize (MB): total index size
  • version: MongoDB version

For a large cluster, it takes a while to collect all the stats and may also take some time to load the JSON document from a browser.

Generate HTML Summary Report

You can generate HTML reports using my Maobi Docker image, an HTML reports generating tool created for Keyhole. The supported inputs are MongoDB cluster information collected by keyhole and MongoDB log files. Execute the command below after installing Docker:

docker run -d -p 3030:3030 simagix/maobi

Click on http://localhost:3030/ and drag the gzipped file to the Keyhole HTML Report Generator window and an HTML will be prompted to download.

The generated report reveals stats and configurations of your MongoDB cluster, and there are two stats I am particularly interested in. The first stat is to ensure that your indexes fit entirely in RAM so that the server can avoid reading the index from disk. This is done by comparing the total index size under Storage to the system memory (system.memSizeMB under Server Info). Below is an example of Storage section:

Database Storage Report

and Server Info section:

Server Info

The other is to check index duplication and usages under each collection. Removing redundant and unused indexes improves performance on writes.

Identify Index Redundancy and Usage

Developers love creating indexes for the obvious read performance improvement reasons. Unmistakably, I have seen indexes that were never used or were duplicates because they were a prefix of other indexes. Below is an example from the generated HTML summary report:

Index Report

An index marked with an ✘ can be deleted because another existing compound index can also cover it. An index marked with a❓should be evaluated because it was never used since the MongoDB server last started.

Keyhole can also display indexes usage by executing the command and confidently mark indexes to be evaluated (with a ? prefix) or dropped (with an x prefix). For example:

keyhole --index "mongodb://user@host.local/test?replicaSet=rs"
test.numbers:
  { _id: 1 }
	host: host.local:27017, ops: 123,456, since: 2019-11-13 14:48:04.473 +0000 UTC
x { a: 1 }
	host: host.local:27017, ops: 0, since: 2019-11-13 14:48:04.473 +0000 UTC
x { a: 1, b: 1 }
	host: host.local:27017, ops: 0, since: 2019-11-13 14:48:04.473 +0000 UTC
  { a: 1, b: 1, c: 1 }
	host: host.local:27017, ops: 54,168, since: 2019-11-13 14:48:04.473 +0000 UTC
? { region: 1 }
	host: host.local:27017, ops: 0, since: 2019-11-13 14:48:04.473 +0000 UTC

The same information is also available from the HTML report, but this is a shortcut to obtaining index information without waiting for a report to be generated.

Next Steps

In Part 1 of this blog, I present many ways to collect MongoDB cluster stats so as to obtain a clear picture of your cluster and whether such is adequately provisioned to support your application. In Part 2, we’ll dive into application performance tuning by analyzing mongo logs and Full-Time Diagnostic data Capture data (FTDC), MongoDB's internal diagnostic data. I would love your feedback on the Keyhole tool. Please get in touch to let me know your thoughts.