2022-05-16T03:24:49.382Z : Connection check out started for server
2022-05-16T03:24:49.382Z : Connection connectionId{localValue:3902} checked out
2022-05-16T03:24:49.382Z : Sending command '{"find":.......' on connection [connectionId{localValue:3902}]
2022-05-16T03:24:49.384Z : Execution of command with request id <id> completed successfully in 1.75 ms on connection [connectionId{localValue:3902}]
2022-05-16T03:24:54.392Z : Received batch of 1 documents with cursorId 0 from server
2022-05-16T03:24:54.392Z : Checked in connection [connectionId{localValue:3902}] to server
Command execution itself was done in 1.75ms which has the single document … so what’s happening after command execution and before connection checkin which is taking 5s and sometimes 10s. … There is no GetMore command executed… first batch itself has the result.
this is sharded cluster… And we are connecting to mongos (3 instances)
it happens on a daily basis but randomly, no pattern at all.
we get this issue on any command execution e.g find, delete, create etc.
e.g This is what we are using for find by id ... and sometimes we see issue in this sometimes in other commands
BsonDocument filter = new BsonDocument()
.append("field1", new BsonString("value1))
.append("field2", new BsonString("value2));
FindIterable<CustomPojo> findIterable = myCollection.find(filter);
return findIterable.first();
i was looking at CPU usage, heap usage and direct buffers usage. everything looks normal…no spikes…no high utilization. GC pauses were also normal.
I was looking at the flow of find command execution, it looks like after the command execution event … driver releases the response buffers to the PowerOfTwoBufferPool … and there it acquires lock/permits … is it possible that release buffer is causing these intermittent pauses ?
@Pradeepp_Sahnii can you give me a sense of your Java driver upgrade history? Was this application running on previous driver releases, and if so, after what upgrade did you first start seeing this issue? I’m trying to narrow down whether this is a regression introduced by a recent (or not so recent) release of the driver.
I don’t currently have a hypothesis for a driver issue that could explain what you’re observing. The 3.11 driver is widely used, and no one else has ever reported this sort of behavior. Given that, it seems likely that the issue is somewhat unique to your particular application and deployment.
Please let us know though if you uncover any additional evidence that would point towards a driver issue.