$merge in controlled way

Kiran_Sunkari · July 11, 2023, 1:44pm

Hi team,
We have a requirement to move some documents from one collection to another. We are evaluating the $merge operator in the aggregation pipeline to solve this problem. This operator is working as expected but is causing our MongoDB CPU usage to spike.
Is there a way to control this? For example, can we run this query on a single core or limit the resources that it uses somehow.?
We are ok with running this query in the background, but it should not cause any spikes or discrepancies in production environments.

John_Sewell · July 11, 2023, 1:55pm

What options do you have on the $merge? I tested locally with a simple merge within the same DB but to a different collection and the CPU jumped a couple of % on the mongo process.

/edit

Also do you see the same CPU spike when running a mongoimport or something similar, or a $out?

Kiran_Sunkari · July 11, 2023, 2:09pm

I also tested writing to different collections. I need this only. As per the documentation, I did not find any options in the $merge operator to control the resources. However, can I use $merge to move data from one collection to another in production?
If $merge uses all the available resources to move the data, I can not use it in the production but if it is limited to certain resources then vertical scaling will work. Is there a way to limit the resources that $merge uses or it will use fixed percentage of resources?
and my merge query:

db.oldCollection.aggregate([
 {
  "$match": {
    "key1": "DefaultValue"
  }
 },
  {
    $merge: {
      into: "newCollection",
      on: ["key1", "key2"],
      whenMatched: "keepExisting",
      whenNotMatched: "insert"
    }
  }
])

John_Sewell · July 11, 2023, 2:19pm

I’m not sure how to limit it or why it’s consuming so much resource to be honest.

We use $merge extensively in all environments up to and including production to export data between collections and databases or into the same collection as an update and have seen no performance issues.

For context, we’re not running on-prem at the moment though but on Atlas.

Does using $out also cause this cpu spike? Do you have indexes on key1 and key2?

Kiran_Sunkari · July 11, 2023, 2:24pm

we have a compound index on {“key1”:1, “key2”: 1} on new collection.

steevej · July 11, 2023, 6:01pm

You may always add a $limit:N stage after the $match so that you merge only N documents at a time. You just need to call your aggregation more often but at least you may reduce the CPU spikes.