Mongodb local session cache refresh is not working

Muhammad_Arslan1 · March 1, 2022, 5:17pm

We have this mongo cluster running. the number of logical sessions never goes down and we end up hitting 1 million( maxSessions: 1000000 ) limit. I printed db.serverStatus().logicalSessionRecordCache and it gives me this me this output

{                                                                        
  activeSessionsCount: 911870,                                           
  sessionsCollectionJobCount: 234,                                       
  lastSessionsCollectionJobDurationMillis: 0,                            
  lastSessionsCollectionJobTimestamp: ISODate("2022-02-26T23:20:18.584Z")
  lastSessionsCollectionJobEntriesRefreshed: 986207,                     
  lastSessionsCollectionJobEntriesEnded: 13794,                          
  lastSessionsCollectionJobCursorsClosed: 0,                             
  transactionReaperJobCount: 1448,                                       
  lastTransactionReaperJobDurationMillis: 0,                             
  lastTransactionReaperJobTimestamp: ISODate("2022-03-01T00:27:35.342Z"),
  lastTransactionReaperJobEntriesCleanedUp: 0,                           
  sessionCatalogSize: 0                                                  
}

lastSessionsCollectionJobTimestamp from the docs says ‘The time at which the last refresh occurred’. And that is more than 2 days ago, which should every 5 minutes from the logicalSessionRefreshMillis: 300000 parameter. I am posting here to find how can I get more information on this issue that why local refresh is not happening.

MaBeuLux88 · March 3, 2022, 5:41am

Hi @Muhammad_Arslan1 and welcome in the MongoDB Community !

First of all, I have to say that I have no idea ! But I’m interested to know the answer!

In the meantime, can you please help us with a bit more informations maybe?

What’s the configuration of your cluster? One Primary and 2 Secondaries?
How much RAM do you have and how much data are you storing in this cluster? How much space is used by your indexes?
Do you use Causal Consistency and sessions in your code? Do you close them correctly in the code? Maybe you can share the piece of code that could be responsible of this?

Cheers,
Maxime.

Muhammad_Arslan1 · March 8, 2022, 7:43pm

Hi Maxime,
Thanks for your response and my apologies for the delay.
From the blog, I found out that this ticket was added to Mongo 4.4.8. For the immediate fix, I downgraded mongo on the vm to 4.4.6 and never seen the issuesince. Earlier we were on 4.4.10. I am still not sure what caused the issue.

to answer your above questions:

its a standalone vm and not a cluster(wrong term used in the question)
RAM:125G, Data: ~8TB
we are not using sessions and causal consistency.

It’s hard to find the piece of code that is responsible for this issue, since I think there is also some relation to the version of mongo, so I can just point out to any recent code changes that could have caused it

MaBeuLux88 · March 8, 2022, 10:32pm

Wow! 8 TB on a single server with only 125 GB of RAM is very unusual.

Usually we recommend to have about 15 to 20% of the data in RAM. You can refer to the Atlas cluster Tiers to get an idea of a recommend healthy config. So for you that would be 1.2 to 1.6 TB of RAM!

Usually we also recommend to start sharding the data when it goes above 2TB so we can keep running on community hardware and avoid using super expensive machines with excessive amount of RAM.

I think it goes without saying that we usually also recommend to use a Replica Set instead of a single server so you have some High Availability in case your Primary node burns down so you have at least a copy and also a couple of secondaries in the starting blocks, ready to take over and continue to deliver your service.

Your problem could also come from the lack of RAM. With this amount of data, I assume your indexes are already larger than your current amount of RAM so I guess your cluster is starving for more RAM to keep everything running smoothly.

Cheers,
Maxime.

PS: Also note that upgrading to 5.0.X would also solve the problem from this ticket.