$in operator array size question, verifying all products exist before updating them

RENOVATIO · February 11, 2023, 12:34am

Hello,

In our place an operator needs to update 2500 products by scanning their barcodes. MongoDB then matches those scanned barcodes into its aggregation query like so:


var products = [array with 2500 product barcodes];

[
  {
    '$match': {
      'productId': {
        '$in': [
          products
        ]
      }
    }
  }
]

After matching it then performs necessary update operations.

QUESTION 1) Immediate question: Given the amount of products, is this optimal? How many product barcodes are too many to pass for $in operator?

Before performing the aforementioned step it is necessary to find out if all the scanned products exist (were already added into the system). The way it is configured now, a match query is run before the previously mentioned query together with the count stage to get the number of items matched. The retrieved count is then compared to the length of the products array supplied.

So, in summary, the way it is now:

Match + count query runs first and prevents the update operation if not all products are in the system. The match query looks exactly like the one shown above.
If step 1 passes, Another identical Match query runs again, this time in the updateMany() method.

QUESTION 2) is there a more optimal way to run this 2 step operation? Maybe it is possible to achieve the same result in the single updateMany query?

Thank you very much

Satyam · February 20, 2023, 3:58am

Hey @RENOVATIO,

Can you please elaborate on this further? Why are you searching for productId in the array first instead of just searching in your collection and then updating the document that corresponds that that particular productId? It would be great if you can give us sample documents along with the query you are using and the output for us to better understand your current process and help you better.

There is no hard limit as such on how many elements are too many for an operator. You can try running .explain output on your queries which can help you better understand how your query is performing.

Regards,
Satyam