insertOne vs insertMany , is one preferred over the other?

I believe I read somewhere in the docs that insertOne is generally a more preferred method. I was attempting to use insertMany and got some varying results from success to bulkwrite errors and duplicate key errors. The array passed to insertMany could have thousands of elements. The avg doc size is 3.8KB.

Now I’ve switched to insertOne and am having better results. I don’t have any specific numbers but it seems that operation time to completion is about the same between the two for me.

I’m wondering if I should perhaps redesign other logic in my application and stay with insertMany() or if insertOne in an operation involving 1 to ~10k documents is acceptable ?

Hi @Stuart_76776,

If you have multiple documents to insert, insertMany is better, faster & recommended.

You have the choice. It’s either this:

for doc in docs:
  insert_one(doc)

Or that:

insert_many(docs)

Let’s imagine that docs contains 1 million documents. The first algorithm with insert_one will send 1M write operations to MongoDB and acknowledge each one of them, one by one. Meaning that you will need 1M TCP exchanges (back and forth) between your back end and the MongoDB cluster.

With the insert_many, the driver is actually sending batches of docs (like 10k docs or more) instead of sending them one by one. It’s a LOT more efficient but, depending on the options you set, you might fail and stop the processing on an error or not. See the options ordered in the doc.

Just as an example: in this Python script ─ that is running every hours to update my COVID-19 data set ─ the collection global_and_us contains 2,481,755 docs as of today and they are all inserted using just a single insertMany that is handling the batching for me.

Because of this, I get about 28.33K docs inserted / seconds. You would never get that number with multiple insert_ones.

Cheers,
Maxime.

5 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.