Paralell js scripting

Hi,

Now, I have a large collection with thousands of collections, and I want to change the schema. To do this, I have created a js script to transform existing data into the new format something like this:

db.find().forEach(function(catalogue) {
… do transform…
db.data.updateOne({ _id: catalogue._id }, { $set: { d: } });

So because of the transform logic is a bit time consuming and the collection contains too many catalogue, the migration takes about 1-2 hours.

Can anybody help me, how can I parallelize the execution?

Thank you

First thing to try is to use https://docs.mongodb.com/drivers/node/v4.1/usage-examples/updateMany/ rather than updateOne().

Thank you for the replay.
during migration, I have to split a string, transform into array, format, the items, etc. Now, it is implemented as a regular js function, but in case of updateMany, I don’t see such an option.
Could you point me some documentation, or example?

thank you

You did not follow the link I supplied. This is the documentation and there are examples. You should also take a look at https://docs.mongodb.com/drivers/node/current/usage-examples/bulkWrite/

I have checked the documentation, of the updateMany, and I wrote a skeleton:

function transform(value) {
print(value)
… do other stuff …
}

db..updateMany(
{},
[
{$set: {d: transform(“$d”)}}
]
)

and it seems, that the transform() function is called at least once, but print out “$d” as string, and not its referred value. I also have tried out passing “$$d” too without any success.

But I need transform() function to be called for every catalogue in the collection and perform the data transform on the field.

Could you help me, what did I wrong?

Thank you

Hi @norbert_NNN,

The JavaScript transform() function you have defined is being invoked on the client side before sending the request to the server. If you want to perform client-side transformations you need to iterate the documents and update individually (as per your original example).

If you want to perform server-side transformation affecting all documents in a collection, I would look at using an aggregation pipeline with a $out destination.

If you are using MongoDB 4.2 or newer and only need to update a subset of documents in a collection, use an aggregation pipeline with a $merge destination instead.

Regards,
Stennie

Register to M220JS from MongoDB university.

Look at the file movie-last-updated-migration.js. They do exactly what you want to do. In this case the transform() they do is they change the type of field from a date coded as a string into a real Date object.

Basically for each Document they want to transform they create updateOne: object with a filter: field and a update: field. Once all updateOne: objects are created they use

This course is worth taking.