MongoDB query for nested array of objects with $elemMatch and $expr

Hi there,

I have a collection with large number of documents (1.2M).
Every document contains a nested array of objects.

I need to find wrong field values <-> amount ≠ quantity * price

db.Sales.find(
{
  saleRows: {
      $elemMatch: {
        $expr:{
            $ne: ['$currentAmountIncludingTax',{$multiply:['$quantity','$currentPriceIncludingTax']}]
        }
      }
  }
}
);

I get this error “$expr can only be applied to the top-level document”

  {
    _id: 'f4a3c536-9e6f-4219-9654-b92ff4645594',
    saleRows: [
      {
        rowId: '1',
        creationDate: '2021-10-06T13:20:48.163Z',
        rowType: 'PRODUCT',
        saleRowType: 'ORDER',
        quantity: 1,
        initialPriceIncludingTax: '39.95',
        currentPriceIncludingTax: '39.95',
        paidPriceIncludingTax: '39.95',
        initialAmountIncludingTax: '39.95',
        currentAmountIncludingTax: '39.95',
        paidAmountIncludingTax: '29.95',
        appliedDiscounts: [],   
      }
    ]
  }

Do you have some ideas ?

Hi @emmanuel_bernard,

I need to find wrong field values <-> amount ≠ quantity * price

Just wanting to clarify something about the above. The sample document you have given has the currentAmountIncludingTax and currentPriceIncludingTax values as strings. Is this the correct format?

Additionally, assuming those fields are converted to integers in some manner, I presume you would not want the sample document returned. Is this correct? If so, would you be able to provide 3-4 sample documents and verify which ones should / shouldn’t be returned.

It may be also helpful to provide further context or use case details with regards to this command.

Looking forward to hearing from you.

Regards,
Jason

2 Likes

Hi @Jason_Tran,

Context = when inserting documents in the collection, there was an error in calculating the amount.
I need to retun the documents where the amount value is wrong and then correct them.

Here is a sample of 4 documents, the 2nd document (error on rowId = 2) and the 4th document (error on rowId = 1) should be returned.

[
  {
    _id: 'f4a3c536-9e6f-4219-9654-b92ff4645594',
    saleRows: [
      {
        rowId: '1',
        quantity: 1,
        currentPriceIncludingTax: 39.95,
        currentAmountIncludingTax: 39.95
      }
    ]
  },
  {
    _id: 'f4a3c536-9e6f-4219-9654-b92ff46455945,
    saleRows: [
      {
        rowId: '1',
        quantity: 1,
        currentPriceIncludingTax: 24.95,
        currentAmountIncludingTax: 24.95
      },
      {
        rowId: '2',
        quantity: 1,
        currentPriceIncludingTax: 17.95,
        currentAmountIncludingTax: 29.95
      }
    ]
  },
  {
    _id: 'f4a3c536-9e6f-4219-9654-b92ff4645596',
    saleRows: [
      {
        rowId: '1',
        quantity: 1,
        currentPriceIncludingTax: 14.95,
        currentAmountIncludingTax: 14.95
      }
    ]
  }
,  {
    _id: 'f4a3c536-9e6f-4219-9654-b92ff4645597',
    saleRows: [
      {
        rowId: '1',
        quantity: 1,
        currentPriceIncludingTax: 20.95,
        currentAmountIncludingTax: 24.95
      }
    ]
  }
]

An aggregation pipeline with $match and $reduce (or $map or $filter) is a solution, but I’m wondering if $elemMatch applied on saleRows could do the job ?

Regards,

Emmanuel

1 Like

Hi @emmanuel_bernard - Thanks for providing those details.

Use of the aggregation pipeline was something I was going to suggest. I used $filter to identify all the mismatches. The example aggregation provided only identifies mismatches and doesn’t modify them. Correcting them would require an extra / seperate step.

An aggregation pipeline with $match and $reduce (or $map or $filter) is a solution, but I’m wondering if $elemMatch applied on saleRows could do the job ?

I’m wondering is there a particular reason you’re after using a .find() with $expr and $elemMatch over the aggregation solution you mentioned? I have not tried this method myself yet but it may not be possible due to the sub-documents within the saleRows array.

Example aggregation:

/// variable y set to the following:

DB> y
{
  '$filter': {
    input: '$saleRows',
    as: 'saleRow',
    cond: {
      '$ne': [
        {$round:[{ '$toDouble': '$$saleRow.currentAmountIncludingTax' },2]},
        {$round:[{
          '$multiply': [
            { '$toDouble': '$$saleRow.quantity' },
            { '$toDouble': '$$saleRow.currentPriceIncludingTax' }
          ]
        },2]}
      ]
    }
  }
}

/// Aggregation using `filter` from above, projecting the mismatches. Used `$round` to try and avoid binary rounding error

DB> db.Sales.aggregate({$addFields:{mismatches:y}},{$project:{mismatches:1}})
[
  { _id: ObjectId("62e1be991bb9515b8fbd4fe7"), mismatches: [] },
  {
    _id: ObjectId("62e1be991bb9515b8fbd4fe8"),
    mismatches: [
      {
        rowId: '2',
        quantity: 1,
        currentPriceIncludingTax: 17.95,
        currentAmountIncludingTax: 29.95
      }
    ]
  },
  { _id: ObjectId("62e1be991bb9515b8fbd4fe9"), mismatches: [] },
  {
    _id: ObjectId("62e1be991bb9515b8fbd4fea"),
    mismatches: [
      {
        rowId: '1',
        quantity: 1,
        currentPriceIncludingTax: 20.95,
        currentAmountIncludingTax: 24.95
      }
    ]
  },
  { _id: ObjectId("62e1dd7d1bb9515b8fbd4feb"), mismatches: [] }
]

Note: I added another document to my test environment which was not supposed to be returned as it is equal after the multiplication but due to binary rounding details, I had added the $round to try and avoid this.

Additional document which was added:

{
    _id: ObjectId("62e1dd7d1bb9515b8fbd4feb"),
    saleRows: [
      {
        rowId: '1',
        quantity: 1,
        currentPriceIncludingTax: 24.95,
        currentAmountIncludingTax: 24.95
      },
      {
        rowId: '2',
        quantity: 3,
        currentPriceIncludingTax: 17.95,
        currentAmountIncludingTax: 53.85
      }
    ]
  }

Please note this may not be the exact desired output you want here but the above would show the mismatches and you can alter the pipeline to better suit your use case if required.

As with any of these operations / suggestions, please thoroughly test this in a test environment to verify it suits all your use cases and requirements.

In saying so, it would possibly be faster (and more straightforward) to fetch all the documents and then proceed to perform the calculations on the client side. Maybe for simplicity with expressing the operation required / getting the desired output whilst also doing the calculation and query in a singular operation, then aggregation might be better suited.

Regards,
Jason

2 Likes

Hi @Jason_Tran

Thanks for your answer and all the explainations !

I tested your aggregation in a test environment and it works fine :slight_smile:

With relational databases I’m used to query using WHERE clause, but aggregation pipeline in MongDB is another way to query.

Great job !

Regards

Emmanuel

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.