Connectors
MongoDB Developer Center
chevron-right
Developer Topics
chevron-right
Products
chevron-right
Connectors
chevron-right

Measuring MongoDB Kafka Connector Performance

Juan Soto, Robert WaltersPublished Feb 15, 2022 • Updated May 09, 2022
Connectors
facebook icontwitter iconlinkedin icon
random alt
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty
With today’s need of flexible event-driven architectures, companies across the globe choose best of breed technologies like MongoDB and Apache Kafka to help solve these challenges. While these two complementary technologies provide the power and flexibility to solve these large scale challenges, performance has always been at the forefront of concerns. In this blog, we will cover how to measure performance of the MongoDB Connector for Apache Kafka in both a source and sink configuration.

Measuring Sink Performance

Recall that the MongoDB sink connector writes data from a Kafka topic into MongoDB. Writes by default use the
ReplaceOneModel
where the data is either updated if it's present on the destination cluster or created as a new document if it is not present. You are not limited to this upsert behavior. In fact, you can change the sink to perform deletes or inserts only. These write behaviors are defined by the
Write Model Strategy
setting in the sink configuration.
To determine the performance of the sink connector, we need a timestamp of when the document was written to MongoDB. Currently, the only write model strategy that writes a timestamp field on behalf of the user is
UpdateOneTimestampsStrategy
and
UpdateOneBusinessKeyTimestampStrategy
. These two write models insert a new field named _insertedTS, which can be used to query the lag between Kafka and MongoDB.
In this example, we’ll use
MongoDB Atlas
. MongoDB Atlas is a public cloud MongoDB data platform providing out-of-the-box capabilities such as
MongoDB Charts
, a tool to create visual representations of your MongoDB data. If you wish to follow along, you can create a
free forever tier
.
Generate Sample Data
We will generate sample data using the
datagen
Kafka Connector provided by Confluent.
Datagen
is a convenient way of creating test data in the Kafka ecosystem. There are a few quickstart schema specifications bundled with this connector. We will use a quickstart called users.
Configure Sink Connector
Now that the data is generated and written to the Kafka topic, “topic333,” let’s create our MongoDB sink connector to write this topic data into MongoDB Atlas. As stated earlier, we will add a field _insertedTS for use in calculating the lag between the message timestamp and this value. To perform the insert, let’s use the UpdateOneTimestampsStrategy write mode strategy.
Note: The field _insertedTS is populated with the time value of the Kafka connect server.
Viewing Results with MongoDB Charts
Take a look at the MongoDB Atlas collection “datagen” and familiarize yourself with the added fields.
Figure 1: Datagen collection as seen in MongoDB Atlas Collections page
In this blog, we will use
MongoDB Charts
to display a performance graph. To make it easy to build the chart, we will create a view.
To create a chart, click on the Charts tab in MongoDB Atlas:
Click on Datasources and “Add Data Source.” The dialog will show the view that was created.
Select the SinkView and click Finish.
Download the
MongoDB Sink performance Chart
from Gist. ​​
Choose Import Dashbaord from the Add Dashboard dropdown and select the downloaded file.
Load the sink-perfromance.chart file.
Select the kafka.SinkView as the data source at the destination then click Save.
Now the KafkaPerformance chart is ready to view. When you click on the chart, you will see something like the following:
This chart shows statistics on the differences between the timestamp in the Kafka topic and Kafka connector. In the above example, the maximum time delta is approximately one second (997ms) from inserting 40,000 documents.

Measuring Source Performance

To measure the source, we will take a different approach using KSQL to create a stream of the clusterTime timestamp from the MongoDB change stream and the time the row was written in the Kafka topic. From here, we can push this data into a MongoDB sink and display the results in a MongoDB Chart.
Configure Source Connector
The first step will be to create the MongoDB Source connector that will be used to push data onto the Kafka topic.
Generate Sample Data
There are many ways to generate sample data on MongoDB. In this blog post, we will use the
doc-gen
tool (Github repo) to quickly create sample documents based upon the user’s schema, which is defined as follows:
To generate data in your MongoDB cluster, issue the following:
Create KSQL Queries
Launch KSQL and create a stream of the clusterTime within the message.
Note: If you do not have KSQL, you can run it as part of the Confluent Platform all in Docker using the following instructions.
If using Control Center, click ksQLDB, click Editor, and then paste in the following KSQL:
The only information that we need from the message is the clusterTime. This value is provided within the
change stream
event. For reference, this is a sample event from change streams.
Step 3
Next, we will create a ksql stream that calculates the difference between the cluster time (time when it was created on MongoDB) and the time where it was inserted on the broker.
As stated previously, this diff value may not be completely accurate if the clocks on Kafka and MongoDB are different.
Step 4
To see how the values change over time, we can use a window function and write the results to a table which can then be written into MongoDB via a sink connector.
Windowing lets you control how to group records that have the same key for stateful operations, such as aggregations or joins into so-called windows. There are three ways to define time windows in ksqlDB: hopping windows, tumbling windows, and session windows. In this example, we will use tumbling as it is a fixed-duration, non-overlapping, and gap-less window.
Configure Sink Connector
The final step is to create a sink connector to insert all this aggregate data on MongoDB.
Viewing Results with MongoDB Charts
Download the
MongoDB Source performance Chart
from Gist. ​​
Choose Import Dashboard from the Add Dashboard dropdown and select the downloaded file.
You will need to create a Datasource to the new sink collection, “kafka.sourceStats.”
Click on the Kafka Performance Source chart to view the statistics.
In the above example, you can see the 10-second sliding window performance statistics for 1.5M documents. The average difference was 252s, with the maximum difference being 480s. Note that some of this delta could be differences in clocks between MongoDB and Kafka. While not taking these numbers as absolute, simply using this technique is good enough to determine trends and if the performance is getting worse or better.
If you have any opinions on features or functionality enhancements that you would like to see with respect to monitoring performance or monitoring the MongoDB Connector for Apache Kafka in general, please add a comment to
KAFKA-64
.
Have any questions? Check out our
Connectors and Integrations
MongoDB community forum.

Copy Link
facebook icontwitter iconlinkedin icon
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Podcast
MongoDB Podcast Interview with Connectors and Translators Team

May 16, 2022
Tutorial
Go to MongoDB Using Kafka Connectors - Ultimate Agent Guide

May 09, 2022
Article
Learn How to Leverage MongoDB Data within Kafka with New Tutorials!

Jun 14, 2022
Tutorial
Tuning the MongoDB Connector for Apache Kafka

May 09, 2022
Table of Contents
  • Measuring Sink Performance