MongoDB query for nested array of objects with $elemMatch and $expr

emmanuel_bernard · July 20, 2022, 8:35am

Hi there,

I have a collection with large number of documents (1.2M).
Every document contains a nested array of objects.

I need to find wrong field values <-> amount ≠ quantity * price

db.Sales.find(
{
  saleRows: {
      $elemMatch: {
        $expr:{
            $ne: ['$currentAmountIncludingTax',{$multiply:['$quantity','$currentPriceIncludingTax']}]
        }
      }
  }
}
);

I get this error “$expr can only be applied to the top-level document”

  {
    _id: 'f4a3c536-9e6f-4219-9654-b92ff4645594',
    saleRows: [
      {
        rowId: '1',
        creationDate: '2021-10-06T13:20:48.163Z',
        rowType: 'PRODUCT',
        saleRowType: 'ORDER',
        quantity: 1,
        initialPriceIncludingTax: '39.95',
        currentPriceIncludingTax: '39.95',
        paidPriceIncludingTax: '39.95',
        initialAmountIncludingTax: '39.95',
        currentAmountIncludingTax: '39.95',
        paidAmountIncludingTax: '29.95',
        appliedDiscounts: [],   
      }
    ]
  }

Do you have some ideas ?

Jason_Tran · July 27, 2022, 1:00am

Hi @emmanuel_bernard,

I need to find wrong field values ↔ amount ≠ quantity * price

Just wanting to clarify something about the above. The sample document you have given has the currentAmountIncludingTax and currentPriceIncludingTax values as strings. Is this the correct format?

Additionally, assuming those fields are converted to integers in some manner, I presume you would not want the sample document returned. Is this correct? If so, would you be able to provide 3-4 sample documents and verify which ones should / shouldn’t be returned.

It may be also helpful to provide further context or use case details with regards to this command.

Looking forward to hearing from you.

Regards,
Jason

emmanuel_bernard · July 27, 2022, 3:58pm

Hi @Jason_Tran,

Context = when inserting documents in the collection, there was an error in calculating the amount.
I need to retun the documents where the amount value is wrong and then correct them.

Here is a sample of 4 documents, the 2nd document (error on rowId = 2) and the 4th document (error on rowId = 1) should be returned.

[
  {
    _id: 'f4a3c536-9e6f-4219-9654-b92ff4645594',
    saleRows: [
      {
        rowId: '1',
        quantity: 1,
        currentPriceIncludingTax: 39.95,
        currentAmountIncludingTax: 39.95
      }
    ]
  },
  {
    _id: 'f4a3c536-9e6f-4219-9654-b92ff46455945,
    saleRows: [
      {
        rowId: '1',
        quantity: 1,
        currentPriceIncludingTax: 24.95,
        currentAmountIncludingTax: 24.95
      },
      {
        rowId: '2',
        quantity: 1,
        currentPriceIncludingTax: 17.95,
        currentAmountIncludingTax: 29.95
      }
    ]
  },
  {
    _id: 'f4a3c536-9e6f-4219-9654-b92ff4645596',
    saleRows: [
      {
        rowId: '1',
        quantity: 1,
        currentPriceIncludingTax: 14.95,
        currentAmountIncludingTax: 14.95
      }
    ]
  }
,  {
    _id: 'f4a3c536-9e6f-4219-9654-b92ff4645597',
    saleRows: [
      {
        rowId: '1',
        quantity: 1,
        currentPriceIncludingTax: 20.95,
        currentAmountIncludingTax: 24.95
      }
    ]
  }
]

An aggregation pipeline with $match and $reduce (or $map or $filter) is a solution, but I’m wondering if $elemMatch applied on saleRows could do the job ?

Regards,

Emmanuel

Jason_Tran · July 28, 2022, 2:48am

Hi @emmanuel_bernard - Thanks for providing those details.

Use of the aggregation pipeline was something I was going to suggest. I used $filter to identify all the mismatches. The example aggregation provided only identifies mismatches and doesn’t modify them. Correcting them would require an extra / seperate step.

An aggregation pipeline with $match and $reduce (or $map or $filter) is a solution, but I’m wondering if $elemMatch applied on saleRows could do the job ?

I’m wondering is there a particular reason you’re after using a .find() with $expr and $elemMatch over the aggregation solution you mentioned? I have not tried this method myself yet but it may not be possible due to the sub-documents within the saleRows array.

Example aggregation:

/// variable y set to the following:

DB> y
{
  '$filter': {
    input: '$saleRows',
    as: 'saleRow',
    cond: {
      '$ne': [
        {$round:[{ '$toDouble': '$$saleRow.currentAmountIncludingTax' },2]},
        {$round:[{
          '$multiply': [
            { '$toDouble': '$$saleRow.quantity' },
            { '$toDouble': '$$saleRow.currentPriceIncludingTax' }
          ]
        },2]}
      ]
    }
  }
}

/// Aggregation using `filter` from above, projecting the mismatches. Used `$round` to try and avoid binary rounding error

DB> db.Sales.aggregate({$addFields:{mismatches:y}},{$project:{mismatches:1}})
[
  { _id: ObjectId("62e1be991bb9515b8fbd4fe7"), mismatches: [] },
  {
    _id: ObjectId("62e1be991bb9515b8fbd4fe8"),
    mismatches: [
      {
        rowId: '2',
        quantity: 1,
        currentPriceIncludingTax: 17.95,
        currentAmountIncludingTax: 29.95
      }
    ]
  },
  { _id: ObjectId("62e1be991bb9515b8fbd4fe9"), mismatches: [] },
  {
    _id: ObjectId("62e1be991bb9515b8fbd4fea"),
    mismatches: [
      {
        rowId: '1',
        quantity: 1,
        currentPriceIncludingTax: 20.95,
        currentAmountIncludingTax: 24.95
      }
    ]
  },
  { _id: ObjectId("62e1dd7d1bb9515b8fbd4feb"), mismatches: [] }
]

Note: I added another document to my test environment which was not supposed to be returned as it is equal after the multiplication but due to binary rounding details, I had added the $round to try and avoid this.

Additional document which was added:

{
    _id: ObjectId("62e1dd7d1bb9515b8fbd4feb"),
    saleRows: [
      {
        rowId: '1',
        quantity: 1,
        currentPriceIncludingTax: 24.95,
        currentAmountIncludingTax: 24.95
      },
      {
        rowId: '2',
        quantity: 3,
        currentPriceIncludingTax: 17.95,
        currentAmountIncludingTax: 53.85
      }
    ]
  }

Please note this may not be the exact desired output you want here but the above would show the mismatches and you can alter the pipeline to better suit your use case if required.

As with any of these operations / suggestions, please thoroughly test this in a test environment to verify it suits all your use cases and requirements.

In saying so, it would possibly be faster (and more straightforward) to fetch all the documents and then proceed to perform the calculations on the client side. Maybe for simplicity with expressing the operation required / getting the desired output whilst also doing the calculation and query in a singular operation, then aggregation might be better suited.

Regards,
Jason

emmanuel_bernard · July 28, 2022, 11:50am

Hi @Jason_Tran

Thanks for your answer and all the explainations !

I tested your aggregation in a test environment and it works fine

With relational databases I’m used to query using WHERE clause, but aggregation pipeline in MongDB is another way to query.

Great job !

Regards

Emmanuel

system · August 2, 2022, 11:50am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.