Best/Most efficient way to update a string field in all of the collections with large number of documents

ibacompre · July 1, 2022, 1:16am

I’m new to MongoDB with data in Mongo Atlas. I’ve to write a query to update a string field in a collection with large number of documents (north of 500,000). What is the best/ most efficient way to update a string field (say update the case) in all of the collection? Also, is there a way to flag any records that fail the operation or I’ve to resort to JS functions?

I’ll appreciate some pointers towards any material that discusses tips, pitfalls, best practices etc.

Thank you.

Kushagra_Kesav · July 16, 2022, 12:26pm

Hey @ibacompre,

Welcome to the MongoDB Community forums

There are two ways to do that, one is db.collection.updateMany() and another one is Bulk.find.update().

The updateMany operation is overall faster as it is parsed as an ordered bulk write operation that can re-use the resources when modifying the grouped documents matching the applied filter.

Whereas bulk operations create multiple different operations as shown below - that are sent in a single request to the server but are performed as a single separate operation:

var bulk = db.items.initializeUnorderedBulkOp();
bulk.find( { status: "D" } ).delete();
bulk.find( { status: "P" } ).update( { $set: { points: 0 } } )
bulk.execute();

The benefit of bulk operations over separate operations is that it generates a single request to the server for all included operations instead of a new request for each operation. It also allows us a higher level of control as to what documents are updated to minimize the risk of conflicting updates or undesired updates.

I think you can use either one of them.

If you are absolutely sure that the data to be updated is clean (i.e. won’t have failures), then updateMany & bulkWrite are valid options. For bulkWrite: Excluding Write Concern errors, ordered operations stop after an error, while unordered operations continue to process any remaining write operations in the queue, unless when run inside a transaction.

However, if you cannot be sure of the cleanliness of the data, using updateOne in a loop with proper error handling may be more efficient in the long run. The problem documents can be recorded in a list to be looked at later, while the loop can continue processing the rest of the job. This may be less efficient for the server compared to the bulk method, but if you’re expecting an error to happen, this could be a more efficient process.

In conclusion, What is the best/most efficient way depends on the use case and whether any error is expected in the data. Also would depend on the required “efficiency”, whether it’s an efficiency of the whole process in the face of errors, or efficiency for the server hardware.

If you have any doubts, please feel free to reach out to us.

Regards,
Kushagra Kesav

system · August 2, 2022, 10:09pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.