MongoDB Shard Node Synchronization Failure

Dear Team,

We are seeing an issue while resynchronizing one of our failure node in the shared cluster. Our Synchronization is getting failed with an error message every time. We could not able to figure out what could be the exact issue. Initially we through the issue with the open file limit on OS. Currently, the server is configured with the value 1048576 but still we are facing the same error.

MongoDBVersion: 4.2.8
DBSize: 370GB
Error:
Failed to commit collection indexes dbname.tblname: Location16814: error opening file “/shardserver/data/_tmp/extsort-index.218”: errno:24 Too many open files
2023-06-14T00:08:06.979+0000 E INITSYNC [replication-7] collection clone for ‘dbname.tblname’ failed due to Location16814: Error cloning collection ‘dbname.tblname’ :: caused by :: error opening file “/shardserver/data/_tmp/extsort-index.218”: errno:24 Too many open files

Note: The table size is huge and mentioned below.

DocumentCount:367716433
IndexesCount:12
TotalIndexSize:43.8 GB
TotalTableSize:1.3TB
TableStorageSize:194.6 GB

Can anyone provide your valuable feedback about the issue we are facing?

Best Regards,
Ashwin

Hi @ashwin_reddy1 and welcome to MongoDB community forums!!

Based on the details shared, could you help me understand a few more details like:

  1. Can you confirm if the sharded cluster is deployed in a kubernetes environment ?
  2. The space mentioned in the above post, could you confirm using ulimit -a if the configuration has been set?
  3. The MongoDB version 4.2 has reached End of Life in April 2023 and hence no further updates will be made, would you mind upgrading to the latest stable version which involves bug fixes and new features and confirm if you are facing the similar issue?

Regards
Aasawari

Dear Aasawari,

Thanks for your response.

  1. There is no Kubernetes involved in the MongoDB shard cluster it is running on-prem machines.
  2. Yes the Current ulimit -a value is 1048576.
  3. Yes, We will take it up soon. Before upgrading, we want to fix this issue and make sure that all 3 nodes in the cluster are up. Currently, only 2 nodes are active.

If you need any more information, please let me know.

Best Regards,
Ashwin

Hi @ashwin_reddy1
Thank you for the information shared.

As per the response, it seems the value for the limit has been set. Just to make sure and also mentioned in the MongoDB documentation, could you confirm if the system was started using systemctl which uses the ulimit setting.
Please refer to the Linux ulimit documentation for further reference.

Regards
Aasawari

Dear Aasawari,

Yes, the service is configured with systemctl.

Best Regards,
Ashwin

Dear Aasawari,

Do you have any update about the above issue?

Best Regards,
Ashwin