Update_many or bulk update

I have many thousands of records to upsert or adjust. Basically looking for the fastest way to get the data into the collection. I’m using pymongo, but I see only the most basic examples. Basically how should I format a list of records to perform an update_many (inserting if not already there). But if there is abulk operator that can do a better job faster I’ll take it.

Each record is relatively small, and there are lots of them and potentially lots already in the collection so the number of matches/clashes is high.

There is Bulk.find.update() mongo shell method and the corresponding PyMongo bulk interface.

How to format the updates with these bulk methods? I think little more details like a sample input document and what/how you are planning to update, will help discuss details further.

There might be 10 thousands of these…

tick = {“date”: dateObject, “price”: round(close,2), “ticker”: ticker}
db[locationTarget].update_one({“date”: dateObject, “ticker”: ticker}, {“$set”: tick}, upsert=True)

Method and class used in this bulk update operation, using PyMongo:

First, make (or build) a update requests list of all the updates:

requests_list = [
  { UpdateOne( { ... }, { ... }, upsert=True },
  { UpdateOne( { ... }, { ... }, upsert=True },
  ...
]

About:

tick = { ‘date’: dateObject, ‘price’: round(close,2), ‘ticker’: ticker }

  • Each { UpdateOne( { ... }, { ... }, upsert=True } in the request_list will have the following format:
    { UpdateOne( { 'date': dateObject, 'ticker': ticker }, { '$set': tick }, upsert=True }

  • The { '$set': tick } is not clear; I think you mean:
    { '$set': { 'date': dateObject, 'price': round(close,2), 'ticker': ticker } }

Note that in case the date and ticker field values are not changing, no need to specify them in the $set clause.

The Upsert Option:

Since you are using the upsert : True update option, be sure that the query filter matches exactly one document. This means that the date and ticker combination must be unique for each document; an index on these two fields will make the update operation efficient.

Next, run the bulk update operation.

result = bulk_write(requests_list, ordered=False)

  • The option ordered=False specifies that updates are not dependent on any previous individual updates in the list. The individual writes happen at any order and even when there is a failure with a write in between. Also, this has better performance than the ordered writes.

  • The result is of type pymongo.results.BulkWriteResult. The following fields are of interest in this class: matched_count, modified_count, upserted_count, and upserted_ids.

1 Like