Collection.find() vs collection.find().limit()

Nithin_Kumar · March 14, 2022, 8:24am

The filter in the query is based on a property value != X (this is an indexed field) and in the foreach, code is written to update the document’s correspoding property value = X. My program involves updating all the documents in various collections. The number of documents varies for different collections (in order 10K to 500K ).

I understand that cursor is only created on primary and can be lost in case there is a flip.

I want to understand for my usecase, should I use find() or find().limit(BATCH_SIZE) . The tradeoffs, that I was evaluating is:-

with find(), its a single query. with find().limit(), my program will fire number_of_docs/batch_size number of queries.
I am not very sure how cursor is implemented, But if the result set is very big, find() might have long lived memory footprint on the server side than find().limit(batch_size). If internally cursor implements find() in batches, then this argument will not hold.
in case of primary flip, the program can resume without touch the docs which are already updated with new value. so find() is preferred to find().limit()

Can anyone validate my tradeoffs and the choice i am taking to choose find() over find().limit() is correct.

MaBeuLux88_xxx · March 15, 2022, 2:38am

Hi @Nithin_Kumar,

Are you doing find() + foreach updateOne() VS an updateMany()? Why not just do an updateMany and avoid the foreach loop?

Can you share the code maybe?

Cheers,
Maxime.