Performance of Mongodb pods- sharded deployment

Shilpa_Agrawal · June 23, 2022, 12:36pm

Hi all

Using 4.4.13 Mongodb Sharded deployment the goal is the check performance on a storage system using c5n-18xlarge AMI, 4 node cluster with huge storage pool .
Using ycsb tool to generate traffic want to be sure that the traffic hits the disk to get my performance numbers
(72 cpu and 192GB memory I have in each node)
Each data shard pod is configured with 72GB of memory and 24GB of CPU

however when i run traffic I am not sure if i am reaching the best perf ? how do i confirm this ? and make sure that IO is hitting the disk with least latency
how do i ensure all the pods utilize max cpu and its memory assigned so that then the IO reaches the disk ??
Any inputs or tips to tune in or help me know this will be grateful !

steevej · June 23, 2022, 1:40pm

A well configured and behaved DB should seldom

If it does too often, you should then find out why and try to avoid it.

Testing DB performance while hitting the disk is of little value as you do not want to hit the disk when running live traffic. Disks are an order of magnitude slower.

You should do your benchmarks with traffic that does not hit the disk and if you hit the disks in production you should re-size your system so that you do not hit the disks.

Shilpa_Agrawal · June 23, 2022, 4:45pm

@steevej Thank you for the response, however i am testing a SDS solution so unless application uses some amount of disk if not more I wont be able to conclude on the performance numbers on that sds solution hence the query. I have to reach a point where application is using its memory(cache) and then reaching the disk for some portion at least …

Also I am observing the mongo-data-sharded pods at-times overallocated memory based on the traffic is there a way i can limit the pod’s memory so that once it reaches the limit it has to reach out to disk ?

I understand as in-memory application it will perform the best when its not even reaching the disk and doing it all good from memory but my use case here is diff so !! any inputs in this regards shall help

Shilpa_Agrawal · June 27, 2022, 11:18am

@steevej and et all ,
I am using aws rhel 8.4 ec2 instances, for best performance on mongodb should i disable THP ? or any other best practices to follow for mongodb sharded deployment via helm way of install ? for best performance please do guide or share some advice

kevinadi · June 29, 2022, 6:17am

Hi @Shilpa_Agrawal

Regarding recommended settings for MongoDB, you might find the Production Notes contain all the recommendations. And yes, THP is recommended to not be used as per the Production Notes and the related Disable Transparent Huge Pages (THP) page.

Regarding your question:

how do i ensure all the pods utilize max cpu and its memory assigned so that then the IO reaches the disk

I think you can achieve this basically by ensuring that your workload exceeds the hardware’s capability, e.g. maybe you can try to do a collection scan on a collection that’s way larger than your provisioned RAM? Apologies for the lack of ideas, but your question is basically how to do what we tell people not to do so it’s a bit of an unfamiliar territory

Best regards
Kevin