For a 24m records collection is it correct to update using aggregation with custom timeout settings or do multiple batch updates using updateMany and other methods?

yashdeep_hinge · November 2, 2023, 9:36am

Hi Everyone,
I have a aggregation pipeline where the match criteria is generated dynamically by end user and the updates as well. Right now I convert the user input to a aggregation pipeline and I have some scenarios where I am using $lookup for matching parents and child. Now the thing is that collection on which I am running the aggregation is having 24m records and the aggregation is timeout due to which. And there is no scope of optimization for the pipeline because I am asked not the do too many custom changes for user input. And keep it simple.
Now my question is. Is it a best practice to increase the aggregate pipeline timeout and batch size so that It wouldn’t timeout.
Or should I change all of my approach to use multiple fetch and update calls.
And right now I am not using mongo atlas.

Kushagra_Kesav · November 6, 2023, 3:56pm

Hey @yashdeep_hinge,

Welcome to the MongoDB Community forums!

Could you share the aggregation pipeline you are executing dynamically and the sample dataset after omitting the sensitive information?

Could you share the error log you’ve encountered, and the frequency of timeouts, and explain how you arrived at the conclusion that 24 million documents are responsible for timeouts?

In my opinion, there could be a couple of reasons, including slow hardware or poor query performance. However, to gain more understanding, please share the following additional information:

The deployment configuration such as MongoDB version, hardware specs of servers, etc.
How are you running MongoDB - directly on servers, in Docker containers, Kubernetes? This could impact resource usage.

Look forward to hearing from you.

Regards,
Kushagra