I have many thousands of records to upsert or adjust. Basically looking for the fastest way to get the data into the collection. I’m using pymongo, but I see only the most basic examples. Basically how should I format a list of records to perform an update_many (inserting if not already there). But if there is abulk operator that can do a better job faster I’ll take it.
Each record is relatively small, and there are lots of them and potentially lots already in the collection so the number of matches/clashes is high.
There is Bulk.find.update() mongo
shell method and the corresponding PyMongo bulk interface.
How to format the updates with these bulk methods? I think little more details like a sample input document and what/how you are planning to update, will help discuss details further.
There might be 10 thousands of these…
tick = {“date”: dateObject, “price”: round(close,2), “ticker”: ticker}
db[locationTarget].update_one({“date”: dateObject, “ticker”: ticker}, {“$set”: tick}, upsert=True)
Method and class used in this bulk update operation, using PyMongo:
First, make (or build) a update requests list of all the updates:
requests_list = [
{ UpdateOne( { ... }, { ... }, upsert=True },
{ UpdateOne( { ... }, { ... }, upsert=True },
...
]
About:
tick = { ‘date’: dateObject, ‘price’: round(close,2), ‘ticker’: ticker }
-
Each { UpdateOne( { ... }, { ... }, upsert=True }
in the request_list
will have the following format:
{ UpdateOne( { 'date': dateObject, 'ticker': ticker }, { '$set': tick }, upsert=True }
-
The { '$set': tick }
is not clear; I think you mean:
{ '$set': { 'date': dateObject, 'price': round(close,2), 'ticker': ticker } }
Note that in case the date
and ticker
field values are not changing, no need to specify them in the $set
clause.
The Upsert Option:
Since you are using the upsert : True
update option, be sure that the query filter matches exactly one document. This means that the date
and ticker
combination must be unique for each document; an index on these two fields will make the update operation efficient.
Next, run the bulk update operation.
result = bulk_write(requests_list, ordered=False)
-
The option ordered=False
specifies that updates are not dependent on any previous individual updates in the list. The individual writes happen at any order and even when there is a failure with a write in between. Also, this has better performance than the ordered writes.
-
The result
is of type pymongo.results.BulkWriteResult. The following fields are of interest in this class: matched_count
, modified_count
, upserted_count
, and upserted_ids
.
1 Like