High query response time for find operation in mongo 4.0.27 with mmap storage engine

Kapil_Gupta · February 4, 2022, 12:27pm

Scenario - We have application which is doing CRUD operations on database. From past 2 years we are using mongo 3.6.9 and system is working fine without any issues but when we upgraded the database from 3.6.9 to 4.0.27 and then we started getting high query response time for find operation but it is not consistent in nature ,sometimes we are getting high query response in 5 hours or 7 hours or even 11 hours.

Testing Environment:
java client driver on application side - 3.12.9
mongo version - 4.0.27
Storage Engine - mmap
Replica-set : 7 members (4 non-arbiter and 3 arbiter , all voting members)
One of the member CMD as an example:
mongod --keyFile=/mongodb.key --storageEngine mmapv1 --nojournal --noprealloc --smallfiles --ipv6 --bind_ip_all --port 27035 --dbpath=/mmapv1-tmpfs-27035 --replSet rs-app_shardAB-ipv6-7 --quiet --slowms 500 --logpath /data/db/mongo-27035.log --oplogSize 3221 --setParameter diagnosticDataCollectionEnabled=true --logappend --logRotate reopen

Messages:
As a sample, we got these kind of messages on mongo secondary logs given below:

2022-02-02T02:55:54.392+0000 I COMMAND [conn554] command drasessions_1.drasessions command: find { find: “drasessions”, filter: { _id: { sessionid: “ClpGx3:172.16.241.40:15124:1643368779:0080300316” } }, limit: 1, singleBatch: true, $db: “drasessions_1”, $clusterTime: { clusterTime: Timestamp(1643770525, 464), signature: { hash: BinData(0, A9E0739EB1E3BBA9EF776A9FCEC9342E9457D221), keyId: 7042384422720503811 } }, lsid: { id: UUID(“8b321501-be08-4fa8-ada5-367cc1eb555e”) }, $readPreference: { mode: “nearest” } } planSummary: IDHACK keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0 reslen:239 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 28911648 } }, MMAPV1Journal: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { R: 1 } } } protocol:op_msg 28911ms

Troubleshooting performed so far:

We have already checked network latency/CPU/RAM/disk space on VM and blade level, so there was no issue also we tested mongo 3.6.9 in same environment and configuration but there is no issue observed.
We enabled mongostat and also attached it as a reference and found below suspect point:

For one of the secondary there are no CRUD operation for 26 sec (*0 *0 *0 *0 in mongostat observed), then suddenly high CRUD operation(*2444 110 *5781 *1816 in mongostat observed) were found on that secondary, there is no connection lost message found in mongostat for that secondary. This pattern is common whenever we are getting high response time(28 sec) on that secondary.
For Primary it is always showing *0 for insert operation for all the time but on secondary’s replication is happening for insert like given below:

host insert query update delete getmore command flushes mapped vsize res faults qrw arw net_in net_out conn set repl time
[2606:ae00:3001:8311:172:16:244:59]:27032 *245 36 *595 *155 0 65|0 0 6.63G 4.49G 0 0|0 0|0 24.9k 124k 144 rs-app_shardAB-ipv6-4 SEC Feb 3 17:06:46.063
[2606:ae00:3001:8311:172:16:244:60]:27032 *0 *0 *0 *0 0 63|0 0 6.61G 4.49G 0 0|0 0|0 11.3k 71.9k 184 rs-app_shardAB-ipv6-4 SEC Feb 3 17:06:46.068
[2606:ae00:3001:8311:172:16:244:b]:27032 *227 40 *589 *155 0 64|0 0 6.75G 4.48G 0 0|0 0|0 26.6k 157k 144 rs-app_shardAB-ipv6-4 SEC Feb 3 17:06:46.075
[2606:ae00:3001:8311:172:16:244:c]:27032 *0 39 827 150 68 119|0 0 6.79G 4.49G 0 0|0 0|0 592k 1.49m 371 rs-app_shardAB-ipv6-4 PRI Feb 3 17:06:45.627
localhost:27032 *0 41 795 162 70 120|0 0 6.79G 4.49G 0 0|0 1|0 589k 1.50m 371 rs-app_shardAB-ipv6-4 PRI Feb 3 17:06:47.584

Queries:

Why there is no CRUD on one of the secondary for 26 sec when there is no connection lost between primary and secondaries.
Why always 0 insert operation is showing in Primary in mongostat and even replication for insert is happening on secondaries.
Why this behavior is not observed in 3.6.9 where same configuration and environment was used.

Kindly reply to above queries it would really help us to process further as this is becoming a big blocker for us to use mongo in our environment.

Attachments:

We have attached mongostat output for one of the occurrence when we are getting high query response.
rs.status for one of the rs.
mogostat.txt (45.2 KB)
rs.status.txt (9.1 KB)

Udaya_Bhaskar_chimak · November 15, 2022, 6:04pm

can anyone reply to the above issue. Iam also facing the same in my setup

Udaya

Stennie_X · November 20, 2022, 12:16am

Welcome to the MongoDB community @Udaya_Bhaskar_chimak!

Unless you work on the same deployment as the @Kapil_Gupta, I recommend starting a new discussion topic with details relevant to your environment including:

version of MongoDB server
type of deployment (standalone, replica set, or sharded cluster)
relevant error messages or behaviour observed
any steps taken prior to noticing the change in behaviour (for example, was this after an upgrade).

For the original question in this topic regarding MongoDB 4.0 (which reached end of life in April 2022) and MMAP (which was deprecated and ultimately removed in MongoDB 4.2) I suggest upgrading to a supported version of MongoDB (currently 4.2 or later) to see if the problem is still reproducible.

Regards,
Stennie