Copy documents from one collection to another

i have a collection called scheduled_tasks in my db.The collection looks like below:

{
  "_id": {
    "$oid": "6308bbdefcc302677837d855"
  },
  "action": "xxxxxx",
  "user": "sandeep",
  "scheduled_time": "24-08-2022",
  "file_path": "c://asdhkjshd",
  "created_at": "22-08-2022",
  "updated_at": "22-08-2022"
}

{
  "_id": {
    "$oid": "6308bbdefcc302677837d855"
  },
  "action": "xxxxxx",
  "user": "akhil",
  "scheduled_time": "24-07-2022",
  "file_path": "c://asdhkjshd",
  "created_at": "22-07-2022",
  "updated_at": "22-07-2022"
}

Now,i want to copy the documents in above collection to another collection-say planned_activities.I want to copy only the _id and user field from scheduled_tasks collection to my new collection ie planned_activities.Also in my new collection planned_activities ,apart from _id and user(which i got from scheduled_tasks collection),i want to add two new fields such as old_data and new_data.I want my planned_activities collection to be like below

_id:"6308bbdefcc302677837d855"
user:"akhil"
status:null
old_data:null
new_data:2022-08-30T17:58:18.551+00:00

How do i achieve this now.I know how to copy from one collection to another,but i am unaware how to achieve it in my case

You should be able to do this with the aggregation framework. You can $match (if necessary), then $project to get the fields from the current documents you want to use, $addFields to create the new fields and then finally $out to save the new collection.

You could also look at $merge which is similar to $out but has more functionality.

2 Likes

Hi @Doug_Duncan,

Thanks for sharing a speedy solution!

One further improvement: $addFields is equivalent to a $project stage specifying all fields in the input documents. Selection of existing fields and creation of new fields can be done within a $project stage, so an $addFields stage would not be needed for this use case.

Regards,
Stennie

Of course you’re right @Stennie_X. Sometimes I don’t stop and think of the solution I propose, and would have compressed those stages into a single one when writing out the query in real time.

Hi @Doug_Duncan and @Stennie_X,
Thanks for the quick reply.
I have tried the following and it worked:

scheduled_migration.aggregate([
        {"$project": {"_id": 1, "user": 1, "status": "New", "old_migrationtime": None,
                      "new_migrationtime": "$scheduled_time"}},
        {"$merge": {"into": "audit_details"}}

everything looks fine,but I want the order of the fields in the new collection audit_details to be like below:

_id:"6308bbdefcc302677837d855"
user:"akhil"
status:null
old_data:null
new_data:2022-08-30T17:58:18.551+00:00

but my new collection looks like below:

{
  "_id": {
    "$oid": "6308bbdefcc302677837d855"
  },
  "new_migrationtime": "2022-09-26 16:26:00",
  "old_migrationtime": "2022-07-21 16:26:00",
  "status": "modified",
  "user": "xxxxx",
  "last_modified": {
    "$date": {
      "$numberLong": "1662039075400"
    }
  }
}

is there any way to achieve it in the order i require

If you want your fields in a certain order, then you need to $project those fields in that order. Currently you have: _id, user, status, old_migrationtime, new_migrationtime. Change that to be _id, new_migrationtime, old_migrationtime, status, user.

As for the last_modified field, not sure how that got into the documents as it’s not part of the aggregate() call you provided.

Hello,
Can this be done using insertMany in batches? I am using the below script.
I thought exporting any collection documents to another collection can be done easily. Could you please advice if I may overlooking something here.

var docsToInsert = db.PMForms_local.find({ “exist”: “yes” }).toArray();
var batchSize = 20;
var successfulInserts = 0;

for (var i = 0; i < docsToInsert.length; i += batchSize) {
var batch = docsToInsert.slice(i, i + batchSize);
try {
var result = db.collectionB.insertMany(batch);
successfulInserts += result.insertedCount;
} catch (e) {
print("Error inserting batch starting at index " + i + ": " + e.message);
}
}

You’re pulling everything down with the .toArray call which is bad for large collections.

If you’re just moving between collections in the same database or different on the same server then using the $out or $merge aggregation calls is much faster as it’ll all run server side.

If you were taking this approach (and I’ve done something similar for moving data from relational servers to Mongo) then you want to tune the batch size, 20 is tiny, unless each document is 16MB.

If you need to run with lots of data, then you’d want to get the iterator from the find call and loop through that, pushing an insertMany with the current batch as you go.