Mongodb connection checkout is slow

Driver: go.mongodb.org/mongo-driver v1.13.1
Mongo Version: 7
Server language: Go1.21

I have developed a app that listens to kafka messages and after collecting a batch/context time out pushes to MongoDB. I tested this app in local with local mongo server, all worked fine with a max mongo operation latency of 5ms.

I use single client across application and after initiation using mongo.Connect() I ping the server client.Ping()

After deployement to cloud(AWS EKS+Mongo Atlas) each mongodb operation takes a additional 2s. To debug I enabled mongo log and pool monitor logs and noticed the following messages after I called Find() function

PFB for the code use to create the client

    connectionOptions.ApplyURI(c.ConnectionString)
    connectionOptions.SetConnectTimeout(time.Minute)
    connectionOptions.SetMinPoolSize(c.MinConnectionPool) //10
    connectionOptions.SetMaxPoolSize(c.MaxConnectionPool) //50
    connectionOptions.SetMaxConnIdleTime(time.Minute * 5)
    connectionOptions.SetCompressors([]string{"snappy", "zlib", "zstd"})
    connectionOptions.SetMonitor(mongotrace.NewMonitor())
    mongoLogger := &MongoLogger{log: logger.NewResourceLogger("MongoInternalLog"), ctx: log.GetContextWithCorrelation(context.Background(), log.GetDefaultCorrelationParam("MongoInternal"))}
    connectionOptions.SetLoggerOptions(&options.LoggerOptions{
        ComponentLevels: map[options.LogComponent]options.LogLevel{
            options.LogComponentAll: options.LogLevelDebug,
        },
        Sink:              mongoLogger,
        MaxDocumentLength: 1024,
    })
    connectionOptions.SetPoolMonitor(&event.PoolMonitor{
        Event: mongoLogger.PoolEvent,
    })
    opts = utils.Prepend(opts, connectionOptions)
    client, err := mongo.Connect(ctx, opts...)
    if err != nil {
        logger.Error(ctx, "error creating mongo connection", err)
        return nil, fmt.Errorf("mongo.New: error creating mongo connection: %w", err)
    }
    err = client.Ping(ctx, nil)
    if err != nil {
        logger.Error(ctx, "error pinging mongo server", err)
        return nil, fmt.Errorf("mongo.New: error pinging mongo server: %w", err)
    }

PFB for the full log


2024-02-07 13:11:20.039 {"long_message":{"type":"ConnectionCheckedIn","address":"<<>>.<<>>.mongodb.net:1025","connectionId":1,"options":null,"reason":"","serviceId":null,"error":null},"short_message":"mongo pool event"}

2024-02-07 13:11:20.039 {"long_message":["message","Connection checked in","serverHost","<<>>.<<>>.mongodb.net","driverConnectionId",1,"serverPort",1025],"short_message":"Connection checked in"}

2024-02-07 13:11:20.039 {"long_message":["commandName","find","databaseName","SOME_DB_NAME","driverConnectionId",1,"message","Command succeeded","operationId",0,"requestId",1961,"serverHost","<<>>.<<>>.mongodb.net","durationMS",2,"reply","{\"cursor\": {\"firstBatch\": [],\"id\": {\"$numberLong\":\"0\"},\"ns\": \"SOME_DB_NAME.SOME_COLL_NAME\"},\"ok\": {\"$numberDouble\":\"1.0\"},\"$clusterTime\": {\"clusterTime\": {\"$timestamp\":{\"t\":1707291680,\"i\":4}},\"signature\": {\"hash\": {\"$binary\":{\"base64\":\"6sB7qA7UlJUnvLbKSHe5RYC3fuw=\",\"subType\":\"00\"}},\"keyId\": {\"$numberLong\":\"<<>>\"}}},\"operationTime\": {\"$timestamp\":{\"t\":1707291680,\"i\":4}}}","serverPort",1025,"serverConnectionId",207521],"short_message":"Command succeeded"}

2024-02-07 13:11:20.036 {"long_message":["commandName","find","databaseName","SOME_DB_NAME","driverConnectionId",1,"message","Command started","operationId",0,"requestId",1961,"serverHost","<<>>.<<>>.mongodb.net","command","{\"find\": \"SOME_COLL_NAME\",\"filter\": {\"isActive\": true},\"lsid\": {\"id\": {\"$binary\":{\"base64\":\"<<>>\",\"subType\":\"04\"}}},\"$clusterTime\": {\"clusterTime\": {\"$timestamp\":{\"t\":1707291680,\"i\":4}},\"signature\": {\"hash\": {\"$binary\":{\"base64\":\"6sB7qA7UlJUnvLbKSHe5RYC3fuw=\",\"subType\":\"00\"}},\"keyId\": {\"$numberLong\":\"<<>>\"}}},\"$db\": \"SOME_DB_NAME\",\"$readPreference\": {\"mode\": \"secondaryPreferred\"}}","serverPort",1025,"serverConnectionId",207521],"short_message":"Command started"}

2024-02-07 13:11:20.036 {"long_message":{"type":"ConnectionCheckedOut","address":"<<>>.<<>>.mongodb.net:1025","connectionId":1,"options":null,"reason":"","serviceId":null,"error":null},"short_message":"mongo pool event"}

2024-02-07 13:11:20.035 {"long_message":["message","Connection checked out","serverHost","<<>>.<<>>.mongodb.net","driverConnectionId",1,"serverPort",1025],"short_message":"Connection checked out"}

2024-02-07 13:11:20.035 {"long_message":{"type":"ConnectionReady","address":"<<>>.<<>>.mongodb.net:1025","connectionId":1,"options":null,"reason":"","serviceId":null,"error":null},"short_message":"mongo pool event"}

2024-02-07 13:11:20.035 {"long_message":["message","Connection ready","serverHost","<<>>.<<>>.mongodb.net","driverConnectionId",1,"serverPort",1025],"short_message":"Connection ready"}

2024-02-07 13:11:19.975 {"long_message":["driverConnectionId",0,"message","Server heartbeat started","serverHost","<<>>.<<>>.mongodb.net","topologyId","65c3215b76f2f05f266bdc4a","serverConnectionId",204103,"serverPort",1025,"awaited",true],"short_message":"Server heartbeat started"}

2024-02-07 13:11:19.975 {"long_message":["driverConnectionId",0,"message","Server heartbeat started","serverHost","<<>>.<<>>.mongodb.net","topologyId","65c3215b76f2f05f266bdc4a","serverConnectionId",204103,"serverPort",1025,"awaited",true,"durationMS",10010,"reply",{"Addr":"<<>>.<<>>.mongodb.net:1025","Arbiters":null,"AverageRTT":0,"AverageRTTSet":false,"Compression":["snappy","zlib","zstd"],"CanonicalAddr":"<<>>.<<>>.mongodb.net:1025","ElectionID":"000000000000000000000000","HeartbeatInterval":0,"HelloOK":true,"Hosts":["<<>>.<<>>.mongodb.net:1025","<<>>.<<>>.mongodb.net:1026","<<>>.<<>>.mongodb.net:1024"],"IsCryptd":false,"LastError":null,"LastUpdateTime":"2024-02-07T07:41:19.975393897Z","LastWriteTime":"2024-02-07T07:41:19Z","MaxBatchCount":100000,"MaxDocumentSize":16777216,"MaxMessageSize":48000000,"Members":["<<>>.<<>>.mongodb.net:1025","<<>>.<<>>.mongodb.net:1026","<<>>.<<>>.mongodb.net:1024","<<>>.<<>>.mongodb.net:1027"],"Passives":["<<>>.<<>>.mongodb.net:1027"],"Passive":false,"Primary":"<<>>.<<>>.mongodb.net:1024","ReadOnly":false,"ServiceID":null,"SessionTimeoutMinutes":30,"SessionTimeoutMinutesPtr":30,"SetName":"atlas-6aw3ab-shard-0","SetVersion":35,"Tags":[{"Name":"region","Value":"AP_SOUTH_1"},{"Name":"workloadType","Value":"OPERATIONAL"},{"Name":"availabilityZone","Value":"aps1-az1"},{"Name":"nodeType","Value":"ELECTABLE"},{"Name":"provider","Value":"AWS"},{"Name":"diskState","Value":"READY"}],"TopologyVersion":{"ProcessID":"65beb254f66a3031addd0e11","Counter":4},"Kind":10,"WireVersion":{"Min":0,"Max":13}}],"short_message":"Server heartbeat started"}

2024-02-07 13:11:19.655 {"long_message":["driverConnectionId",0,"message","Server heartbeat started","serverHost","<<>>.<<>>.mongodb.net","topologyId","65c3215b76f2f05f266bdc4a","serverConnectionId",202936,"serverPort",1027,"awaited",true],"short_message":"Server heartbeat started"}

2024-02-07 13:11:19.654 {"long_message":["driverConnectionId",0,"message","Server heartbeat started","serverHost","<<>>.<<>>.mongodb.net","topologyId","65c3215b76f2f05f266bdc4a","serverConnectionId",202936,"serverPort",1027,"awaited",true,"durationMS",10009,"reply",{"Addr":"<<>>.<<>>.mongodb.net:1027","Arbiters":null,"AverageRTT":0,"AverageRTTSet":false,"Compression":["snappy","zlib","zstd"],"CanonicalAddr":"<<>>.<<>>.mongodb.net:1027","ElectionID":"000000000000000000000000","HeartbeatInterval":0,"HelloOK":true,"Hosts":["<<>>.<<>>.mongodb.net:1025","<<>>.<<>>.mongodb.net:1026","<<>>.<<>>.mongodb.net:1024"],"IsCryptd":false,"LastError":null,"LastUpdateTime":"2024-02-07T07:41:19.654877201Z","LastWriteTime":"2024-02-07T07:41:19Z","MaxBatchCount":100000,"MaxDocumentSize":16777216,"MaxMessageSize":48000000,"Members":["<<>>.<<>>.mongodb.net:1025","<<>>.<<>>.mongodb.net:1026","<<>>.<<>>.mongodb.net:1024","<<>>.<<>>.mongodb.net:1027"],"Passives":["<<>>.<<>>.mongodb.net:1027"],"Passive":true,"Primary":"<<>>.<<>>.mongodb.net:1024","ReadOnly":false,"ServiceID":null,"SessionTimeoutMinutes":30,"SessionTimeoutMinutesPtr":30,"SetName":"atlas-6aw3ab-shard-0","SetVersion":35,"Tags":[{"Name":"region","Value":"AP_SOUTH_1"},{"Name":"provider","Value":"AWS"},{"Name":"diskState","Value":"READY"},{"Name":"nodeType","Value":"ANALYTICS"},{"Name":"availabilityZone","Value":"aps1-az1"}],"TopologyVersion":{"ProcessID":"65beb41b8483d5517a76b09f","Counter":4},"Kind":10,"WireVersion":{"Min":0,"Max":13}}],"short_message":"Server heartbeat started"}

2024-02-07 13:11:18.034 {"long_message":{"type":"ConnectionCreated","address":"<<>>.<<>>.mongodb.net:1025","connectionId":1,"options":null,"reason":"","serviceId":null,"error":null},"short_message":"mongo pool event"}

2024-02-07 13:11:18.033 {"long_message":["message","Connection created","serverHost","<<>>.<<>>.mongodb.net","driverConnectionId",1,"serverPort",1025],"short_message":"Connection created"}

2024-02-07 13:11:18.033 {"long_message":{"type":"ConnectionCheckOutStarted","address":"<<>>.<<>>.mongodb.net:1025","connectionId":0,"options":null,"reason":"","serviceId":null,"error":null},"short_message":"mongo pool event"}

2024-02-07 13:11:18.033 {"long_message":["message","Connection checkout started","serverHost","<<>>.<<>>.mongodb.net","serverPort",1025],"short_message":"Connection checkout started"}

2024-02-07 13:11:18.033 {"long_message":["selector","","operation","find","topologyDescription","Type: ReplicaSetWithPrimary, Servers: [{ Addr: <<>>.<<>>.mongodb.net:1025, Type: RSSecondary, State: Connected, Tag sets: region=AP_SOUTH_1,workloadType=OPERATIONAL,availabilityZone=aps1-az1,nodeType=ELECTABLE,provider=AWS,diskState=READY, Average RTT: 1.323328ms, Min RTT: 1.253353ms }, { Addr: <<>>.<<>>.mongodb.net:1026, Type: RSSecondary, State: Connected, Tag sets: workloadType=OPERATIONAL,provider=AWS,availabilityZone=aps1-az3,region=AP_SOUTH_1,nodeType=ELECTABLE,diskState=READY, Average RTT: 1.041377ms, Min RTT: 971.188µs }, { Addr: <<>>.<<>>.mongodb.net:1027, Type: RSSecondary, State: Connected, Tag sets: region=AP_SOUTH_1,provider=AWS,diskState=READY,nodeType=ANALYTICS,availabilityZone=aps1-az1, Average RTT: 1.369299ms, Min RTT: 1.261464ms }, { Addr: <<>>.<<>>.mongodb.net:1024, Type: RSPrimary, State: Connected, Tag sets: nodeType=ELECTABLE,availabilityZone=aps1-az2,diskState=READY,provider=AWS,region=AP_SOUTH_1,workloadType=OPERATIONAL, Average RTT: 806.192µs, Min RTT: 743.374µs }, ]","operationId",1961,"serverHost","<<>>.<<>>.mongodb.net","serverPort",1025],"short_message":"Server selection succeeded"}

2024-02-07 13:11:18.033 {"long_message":["selector","","operation","find","topologyDescription","Type: ReplicaSetWithPrimary, Servers: [{ Addr: <<>>.<<>>.mongodb.net:1026, Type: RSSecondary, State: Connected, Tag sets: workloadType=OPERATIONAL,provider=AWS,availabilityZone=aps1-az3,region=AP_SOUTH_1,nodeType=ELECTABLE,diskState=READY, Average RTT: 1.041377ms, Min RTT: 971.188µs }, { Addr: <<>>.<<>>.mongodb.net:1027, Type: RSSecondary, State: Connected, Tag sets: region=AP_SOUTH_1,provider=AWS,diskState=READY,nodeType=ANALYTICS,availabilityZone=aps1-az1, Average RTT: 1.369299ms, Min RTT: 1.261464ms }, { Addr: <<>>.<<>>.mongodb.net:1024, Type: RSPrimary, State: Connected, Tag sets: nodeType=ELECTABLE,availabilityZone=aps1-az2,diskState=READY,provider=AWS,region=AP_SOUTH_1,workloadType=OPERATIONAL, Average RTT: 806.192µs, Min RTT: 743.374µs }, { Addr: <<>>.<<>>.mongodb.net:1025, Type: RSSecondary, State: Connected, Tag sets: region=AP_SOUTH_1,workloadType=OPERATIONAL,availabilityZone=aps1-az1,nodeType=ELECTABLE,provider=AWS,diskState=READY, Average RTT: 1.323328ms, Min RTT: 1.253353ms }, ]","operationId",1961],"short_message":"Server selection started"}

2024-02-07 13:11:18.033 {"long_message":null,"short_message":"Making DB call"}

What is the root cause of this lag of 2s between ConnectionCheckOutStarted and ConnectionCheckedOut event and how to fix it?

Hello @Sabariram_Chinnathambi, thanks for the question!

Did you observe the lag in Ping() as well? Is the logged Find() the initial operation?

I think your operations are blocked by pool initialization because connection establishment on your deployment environment is slower than your local settings.