How to use aggregate functions inside of bulkWrite query?

Hi there folks. I’m working in a migration that will update (updateMany) a couple of millions of documents, with some basic evaluation based on some existing fields. To do that, I will use bulkWrite.

The problem is, for these evaluations, I would like to use $switch or if conditions, but it seems it doesn’t like it inside of the structure of bulkWrite. The query I have is the following:

const result = await mongo.collection('tickets').bulkWrite([
        {
            updateMany: {
                filter: {
                    $or: [{ 'fsm.state': null }, { 'fsm.state': { $exists: false } }]
                },
                update: {
                    $set: {
                        'fsm.state': {
                            $switch:
                            {
                                branches: [
                                    {
                                        case: { endedAt: { $ne: null } },
                                        then: "closed"
                                    },
                                    {
                                        case: { $and: [{ followUp: { $ne: null } }, { 'followUp.pending': true }] },
                                        then: "pending_follow_up"
                                    },
                                    {
                                        case: { $and: [{ review: { $ne: null } }, { 'review.pending': true }] },
                                        then: "pending_feedback"
                                    }
                                ],
                                default: "chat_in_progress"
                            }
                        }
                    },
                    $unset: {
                        state: ''
                    }
                },
                upsert: false
            }
        }
    ],
        {
            ordered: false
        }
    );

for what I get the error The dollar ($) prefixed field '$switch' in 'fsm.state.$switch' is not valid for storage . So searching a bit I found this topic (MongoError: The dollar ($) prefixed field ‘$cond’ in ‘energy.$cond’ is not valid for storage) which suggests to put the contents of update in an array, for which it throws the following error:

BSON field 'update.updates.u' is the wrong type 'array', expected type 'object'

Is there a way to achieve this?

My approach would be different.

I would have 4 different updateMany, one for each cases. Something like:

fsm_state = { $or: [{ 'fsm.state': null }, { 'fsm.state': { $exists: false } } ] }
bulk_write = [
  updateMany : {
    filter : { $and : [ fsm_state , { endedAt: { $ne: null } } ] } ,
    update : { $set : { "fsm.state" : "closed } } 
  } ,
  ...
  updateMany : {
    filter : { $and : [ fsm_state , { review: { $ne: null } }, { 'review.pending': true } ] ] ,
    update : { $set : { "fsm.state" : "pending_feedback } }
  } ,
  updateMany : {
    filter : { fsm_state } ,
    update : { "fsm.state" : "chat_in_progress" }
] 

It is a bulkWrite, nothing stops you from doing many updateMany and each updateMany handles one of your switch cases by having a more specific filter.

The three dots in the above is left as an exercise to the reader. I’m confident that you can figure it out.

1 Like

Thank you for your answer, clever approach :slight_smile: . The only thing is, as I want to use bulkWrite for performance reasons over a simple updateMany, by having separate updateMany statements in my query, would that be as performant as having a single updateMany (do they execute secuentially, or are they kind of merged into a single statement)?

Answering myself, I think I would lose some performance, as the operations would be executed sequentially (I would need to remove the ordered: false as the last updateMany would rely on the previous ones to be executed), so although it would work, it wouldn’t be as performant as having a single updateMany or at least a few updateMany without having to run in order.

You certainly can modify the last filter to $not of $and of all the previous conditions.

Without trying I am not sure

Sometimes performance evaluation is not always intuitive.

Anyway for a one time migration, performance is not always mandatory. I proposed an alternative because your $switch is apparently not working.

Yeah I can do that.

At the end I made 2 bulkWrite statements, an unordered one for the first 3 conditions plus the $unset, and the remaining one (chat_in_progress) in a separate one.
The migration that was previously created was a paginated find and then .updateOne so performance increased greatly, will try using the $not with the previous conditions to see if it improves it to put it on a single bulkWrite.

Again, thank you and really appreciate the alternative approach, it worked for me :slight_smile:

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.