How to debug hanging connection in springboot reactive app

Hello,

I am building a web app with webflux springboot 3.3.0 and spring-data-mongodb 4.3.1 using mongodb-driver-reactivestreams 5.1.1 and Java 17

Mongodb uses springboot autoconfiguration, no special configuration applied.
When I do a simple repository query:

public Mono<Application> findByIdWithDsl(String applicationId) {
    return reactiveMongoTemplate.findById(applicationId, Application.class)
            .log();
}

Sometimes (not always) it hangs while getting a connection - it blocks forever. Is there something I could do to get some more insight on what’s going on? It does happen also when there is no load / no other requests.
Or anybody have some hints why it might behave like this?

2024-06-17 21:40:33.941 DEBUG o.s.data.mongodb.core.ReactiveMongoTemplate#doFindOne:2330    [nioEventLoopGroup-4-3]: findOne using query: { "id" : "6643812722e3a5542b9e9735"} fields: Document{{}} for class: class org.lowcoder.domain.application.model.Application in collection: application 
2024-06-17 21:40:33.942 INFO reactor.util.Loggers$Slf4JLogger#info:279    [nioEventLoopGroup-4-3]: | onSubscribe([Fuseable] FluxOnAssembly.OnAssemblySubscriber) 
2024-06-17 21:40:33.943 INFO reactor.util.Loggers$Slf4JLogger#info:279    [nioEventLoopGroup-4-3]: | request(unbounded) 
2024-06-17 21:40:33.944 DEBUG c.mongodb.internal.diagnostics.logging.SLF4JLogger#debug:56    [nioEventLoopGroup-4-3]: Server selection started for operation with ID 44. Selector: ReadPreferenceServerSelector{readPreference=primary}, topology description: {type=STANDALONE, servers=[{address=127.0.0.1:37017, type=STANDALONE, roundTripTime=20.9 ms, state=CONNECTED}] 
2024-06-17 21:40:33.945 DEBUG c.mongodb.internal.diagnostics.logging.SLF4JLogger#debug:56    [nioEventLoopGroup-4-3]: Server selection succeeded for operation with ID 44. Selected server: 127.0.0.1:37017. Selector: ReadPreferenceServerSelector{readPreference=primary}, topology description: {type=STANDALONE, servers=[{address=127.0.0.1:37017, type=STANDALONE, roundTripTime=20.9 ms, state=CONNECTED}] 
2024-06-17 21:40:33.946 DEBUG c.mongodb.internal.diagnostics.logging.SLF4JLogger#debug:56    [nioEventLoopGroup-4-3]: Checkout started for connection to 127.0.0.1:37017

Hi @Ludovit_Mikula and welcome to the community forum!!

There could be certainly more than one reason for the issue that you are observing.

It would be helpful if you could capture the mongod logs during the issue and observe for any error or unexpected messages during the coarse of time.

Regards
Aasawari

Thank you!
So I have updated the config to get more verbose logs from mongo, I am running the official 7.0 image.

I get logs like this - it seems the connection is made, but not used further. Maybe the callback is not called/propagated?

{"t":{"$date":"2024-06-19T20:29:43.202+00:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn11","msg":"client metadata","attr":{"remote":"172.20.0.1:37748","client":"conn11","negotiatedCompressors":[],"doc":{"driver":{"name":"mongo-java-driver|reactive-streams|spring-boot","version":"4.11.1"},"os":{"type":"Linux","name":"Linux","architecture":"amd64","version":"5.15.0-112-generic"},"platform":"Java/Ubuntu/17.0.11+9-Ubuntu-122.04.1"}}}
{"t":{"$date":"2024-06-19T20:29:43.202+00:00"},"s":"I",  "c":"ACCESS",   "id":6788604, "ctx":"conn11","msg":"Auth metrics report","attr":{"metric":"acquireUser","micros":9}}
{"t":{"$date":"2024-06-19T20:29:43.280+00:00"},"s":"I",  "c":"ACCESS",   "id":5286306, "ctx":"conn11","msg":"Successfully authenticated","attr":{"client":"172.20.0.1:37748","isSpeculative":true,"isClusterMember":false,"mechanism":"SCRAM-SHA-256","user":"lowcoder","db":"admin","result":0,"metrics":{"conversation_duration":{"micros":77229,"summary":{"0":{"step":1,"step_total":2,"duration_micros":66},"1":{"step":2,"step_total":2,"duration_micros":28}}}},"extraInfo":{}}}
{"t":{"$date":"2024-06-19T20:29:43.281+00:00"},"s":"I",  "c":"NETWORK",  "id":6788700, "ctx":"conn11","msg":"Received first command on ingress connection since session start or auth handshake","attr":{"elapsedMillis":1}}
{"t":{"$date":"2024-06-19T20:30:04.806+00:00"},"s":"I",  "c":"WTCHKPT",  "id":22430,   "ctx":"Checkpointer","msg":"WiredTiger message","attr":{"message":{"ts_sec":1718829004,"ts_usec":806572,"thread":"1:0x7f4b3fe61640","session_name":"WT_SESSION.checkpoint","category":"WT_VERB_CHECKPOINT_PROGRESS","category_id":6,"verbose_level":"DEBUG_1","verbose_level_id":1,"msg":"saving checkpoint snapshot min: 7, snapshot max: 7 snapshot count: 0, oldest timestamp: (0, 0) , meta checkpoint timestamp: (0, 0) base write gen: 149413"}}}

When I stop the application, I can see all connections (also the “hanging ones” being closed):

{"t":{"$date":"2024-06-19T20:46:09.335+00:00"},"s":"I",  "c":"-",        "id":20883,   "ctx":"conn28","msg":"Interrupted operation as its client disconnected","attr":{"opId":16107}}
{"t":{"$date":"2024-06-19T20:46:09.336+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn27","msg":"Connection ended","attr":{"remote":"172.20.0.1:48500","uuid":{"uuid":{"$uuid":"d3a3f1f8-044a-4d27-8c51-9bd2bb4495af"}},"connectionId":27,"connectionCount":8}}
{"t":{"$date":"2024-06-19T20:46:09.336+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn28","msg":"Connection ended","attr":{"remote":"172.20.0.1:48504","uuid":{"uuid":{"$uuid":"1e70f918-1fc2-463a-b420-33e164c327d5"}},"connectionId":28,"connectionCount":7}}
{"t":{"$date":"2024-06-19T20:46:09.362+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn31","msg":"Connection ended","attr":{"remote":"172.20.0.1:41842","uuid":{"uuid":{"$uuid":"d679d083-c9c6-4822-aa76-a3bc1d30ba40"}},"connectionId":31,"connectionCount":6}}

PS: are the log messages being buffered? If yes, is there a way to turn off buffering?