What does oplog.rs getMore command do? will it cause disk latency and memory to increase?

Vinay_Manikanda · August 25, 2022, 3:18am

I have noticed in one of the secondary Mongodb logs about the local.oplog.rs command which more or less caused disk IO and memory to increase at the same time and resulted in issues.

what exactly does this command do and why was it invoked and it seems that its initiated huge batch size. i dont think anything significant happened in primary which would have caused it to replay the oplog.

command local.oplog.rs command: getMore { getMore: 8464225076500301533, collection: "oplog.rs", batchSize: 13981010,.........

Thanks

Aasawari · August 26, 2022, 6:53am

Hi @Vinay_Manikanda and welcome to the community!!

The getMore indicated that something is requesting the next batch of results. The log snippet above indicates that something is tailing the oplog, either another secondary (if chained replication is enabled – it is by default), or an application

As mentioned, could you help me understand how did you figure out that issue(mentioned in the logs) was caused by this?

Also, could you confirm a few more things based on the above issues observed:

The MongoDB version you are using.
The output for rs.status() and rs.printReplicationInfo()
As mentioned, this issue is observed in secondary replica set oplog. Do you similar logs in the primary oplog entries? Typically the full log lines would also include the connection id. Is it possible to track the connection id requesting this back to their IP address? You can see the IP address and the connection id when they first connect to the server.

Please help us with the above details so we could assist you further.

Thanks
Aasawari

Vinay_Manikanda · August 29, 2022, 3:40pm

Hi Aasawari,

Thanks for your response.

I am not exactly sure if the disk latency and memory spike caused by this operation but it aligns with the exact time we had this issue on both the secondary’s but not primary. I see this local.oplog.rs getMore call on both secondary’s.

Mongodb version is 4.2.14. I don’t have the rs.status() during the time of the issue but this memory spike eventually triggered OOM killer to target the mongod in both the secondary instances and replicaset went into bad state. I couldn’t trace the connection id back to any of the ip address or initial connection in any of the logs.

Aasawari · August 31, 2022, 7:58am

Hi @Vinay_Manikanda

Could you please help me in understanding the issue in a more specific way by clarifying the below concerns:

As mentioned, the getMore is only seen in the secondaries and not primary, do both the secondaries have similar getMore oplog entries pattern (e.g. getMore: <some large number>, collection: "oplog.rs", batchSize: <some other large number>) at approximately the same timestamp? Could you please post them?
While the issue is observed, do you see any specific patterns in your application, e.g. perhaps a batch operation, a scheduled task, etc.?
Are you using a framework that tails the oplog similar to Meteor in your application? Alternatively, are any of your application tails the oplog by design?

In terms of associating the connection source IP to a certain connection number, in MongoDB 4.2 series the log lines would be similar to this:

2022-08-31T10:19:33.101+0530 I NETWORK [listener] connection accepted from 127.0.0.1:64149 #16 (8 connections now open)

The above line signifies a new connection from 127.0.0.1:64149 which is assigned the number 16

2022-08-31T10:19:33.101+0530 I NETWORK [conn16] received client metadata from 127.0.0.1:64149 conn16: { driver: { name: “NetworkInterfaceTL”, version: “4.2.14” }, os: { type: “Darwin”, name: “Mac OS X”, architecture: “x86_64”, version: “21.6.0” } }

Subsequently operations from this IP are marked using the string [conn16]

Could you find a similar pair of log lines that can show the originating IP of the getMore queries? This will help you identify the source of the queries.

Please note that in MongoDB 4.2 series, the latest version is 4.2.22. I would strongly recommend you to upgrade to the latest version for improvements and bug fixes (see the release notes for more details) to ensure that you’re not encountering any issues that were fixed in newer versions.

Also, the latest MongoDB version is currently 6.0.1 which contains major improvements from the 4.2 series. Please consider upgrading to the 6.0 series as well.

If the issue still persists even, could you please share the rs.status() and rs.conf() details for the deployment, along with any information that will help us reproduce what you’re seeing?

Best regards
Aasawari