Mongodb Bulk write Updateone or Updatemany

I want to know if its faster(importing) using updateone or updatemany with bulk write.My code for importing the data into the collection with pymongo look is this:

for file in sorted_files:
    df = process_file(file)
    for row, item in df.iterrows():
        data_dict = item.to_dict()
        bulk_request.append(UpdateOne(
            {"nsamples": {"$lt": 12}},
            {
                "$push": {"samples": data_dict},
                "$inc": {"nsamples": 1}
            },
            upsert=True
        ))
    result = mycol1.bulk_write(bulk_request)

When i tried update many the only thing i change is this:

...
...
bulk_request.append(UpdateMany(..
..
..

I didnt see any major difference in insertion time.Shouldnt updateMany be way faster?
Maybe i am doing something wrong.Any advice would be helpful!
Thanks in advance!

Note:My data consist of 1.2m rows .I need each document to contain 12 subdocuments.

Yes it is supposed to be. But, as answered in your other thread, your observed time is not necessarily related to the performance of your server since you have file access and other logic intermixed. Your hardware configuration might also be inadequate for your use case.

Yes.i will check everything you replied to me. but i am wondering if this code is right.maybe i am doing something wrong.Is this how the right way to do it?

for file in sorted_files:
    df = process_file(file)
    for row, item in df.iterrows():
        data_dict = item.to_dict()
        bulk_request.append(UpdateMany(
            {"nsamples": {"$lt": 12}},
            {
                "$push": {"samples": data_dict},
                "$inc": {"nsamples": 1}
            },
            upsert=True
        ))
    result = mycol1.bulk_write(bulk_request)

UpdateMany is useful when you want to apply the exact same update operation to multiple documents in a collection. This does not appear to be what you want to use here. It seems like your intention is to add each data_dict sample to the data set once (to a single document), is that correct? If so, then you should be using UpdateOne.

As for why you don’t see a performance difference between the two, I suspect that is because the query ({"nsamples": {"$lt": 12}}) only ever has either 0 or 1 result in which case UpdateOne and UpdateMany are identical.

I understand.Yes you are right.Yes what i am trying to do is to add each data_dict to the dataset to a single document and when that specific document get full because the inc then we go to the next document and we do the same again.When we finish every document should have 12 subdocuments inside…One thing more.Should i set a loop for every 1000 updates?Will i see any difference? I say that because of that Each group of operations can have at most 1000 operations. If a group exceeds this limit, MongoDB will divide the group into smaller groups of 1000 or less. For example, if the bulk operations list consists of 2000 insert operations, MongoDB creates 2 groups, each with 1000 operations.

Should i set a loop for every 1000 updates?

No, you should not batch at 1000 ops. 1000 ops used to be the bulk write batch size limit but that limit was increased to 100,000 ops starting in MongoDB 3.6 (back in 2017). https://docs.mongodb.com/manual/reference/limits/#mongodb-limit-Write-Command-Batch-Limit-Size

It’s ideal to pass as many bulk_write operations as possible in a single call. PyMongo will automatically batch the operations together in chunks of 100,000 (or when a chunk reaches a total size of 48MB). The next real limitation is the app’s memory. It might be inefficient or impossible to materialize all the operations in a single call to bulk_write. For example, let’s say you have 12,000,000 ops and each one is 1024 bytes, then you would need at least 12GB of memory. To solve this problem the app can batch manually at 100,000 ops which gives the same MongoDB performance with lower client side memory usage.

A further optimization would be to use multiple threads and execute multiple bulk writes in parallel using a single MongoClient shared between them.

1 Like

I have answered this question with code to achieve it on StackOverflow.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.