Need help filtering documents with criteria inside array - mongoDB

Need help with query to filter the records in mongoDB. I am using compass to run the que We have thousands of records/documents where each record/document contains the following array. For few documents, the events in lifeCycleinfo are out of order i.e. payment.completed event comes before 1payment.completed1 event.

I need to filter those records where payment.completed event comes before payment.created event.

Sample Object:

{
  "lifeCycleInfo": [
    {
      "eventId": "9b8b6adfae",
      "eventSubType": "SendTransfer_Receipt",
      "eventType": "SendTransfer",
      "odsTimestamp": {
        "$date": "2023-02-06T14:33:42.308Z"
      },
      "payload": "{}",
      "timestamp": {
        "$date": "2023-02-06T14:33:42.271Z"
      }
    },
    {
      "eventId": "06e8d144-531b02",
      "eventSubType": "payment.created",
      "eventType": "Notification",
      "odsTimestamp": {
        "$date": "2023-02-06T14:33:45.488Z"
      },
      "payload": "{}",
      "timestamp": {
        "$date": "2023-02-06T14:33:45.479Z"
      }
    },
    {
      "eventId": "9da54454d6",
      "eventSubType": "payment.completed",
      "eventType": "Notification",
      "odsTimestamp": {
        "$date": "2023-02-06T14:33:46.698Z"
      },
      "payload": "{}",
      "timestamp": {
        "$date": "2023-02-06T14:33:46.689Z"
      }
    }
  ]
}

I tried to find it based on array index but not working.

{"lifeCycleInfo[1].eventtype":"payment.completed"}
1 Like

@Suraj_Jaldu Here’s a MongoDB query that can help filter the records based on the requirement:

db.collection.find({
   "lifeCycleInfo": {
      $elemMatch: {
         "eventSubType": "payment.completed",
         "timestamp": { $lt: "$lifeCycleInfo.timestamp" }
      }
   }
})

In this query, the $elemMatch operator is used to match the first element in the lifeCycleInfo array that satisfies both the conditions:

  • The eventSubType must be equal to payment.completed.
  • The timestamp must be less than the timestamp of the element with eventSubType equal to payment.created.

This query will return the documents whose lifeCycleInfo array contains an event with eventSubType equal to payment.completed and the timestamp of this event is before the timestamp of the event with eventSubType equal to payment.created.

Hi @Suraj_Jaldu welcome to the community!

I don’t think the query that @Sumanta_Mukhopadhyay provided is correct. I tried it using some example document but it returns nothing.

Here’s the example documents I used. Based on your example, I created two documents, where one has payment.created is before payment.completed, and one has the order reversed. I removed other fields for testing purposes, and based on your problem description, the query should match the document with _id: 1 but not _id: 0. I also took the liberty of using ISODate() type for the timestamps.

> db.test.find()
[
  {
    _id: 0,
    lifeCycleInfo: [
      {
        eventSubType: 'SendTransfer_Receipt',
        timestamp: ISODate("2023-02-06T14:33:42.271Z")
      },
      {
        eventSubType: 'payment.created',
        timestamp: ISODate("2023-02-06T14:33:45.479Z")
      },
      {
        eventSubType: 'payment.completed',
        timestamp: ISODate("2023-02-06T14:33:46.689Z")
      }
    ]
  },
  {
    _id: 1,
    lifeCycleInfo: [
      {
        eventSubType: 'SendTransfer_Receipt',
        timestamp: ISODate("2023-02-06T14:33:42.271Z")
      },
      {
        eventSubType: 'payment.created',
        timestamp: ISODate("2023-02-06T14:33:45.479Z")
      },
      {
        eventSubType: 'payment.completed',
        timestamp: ISODate("2023-02-05T14:33:46.689Z")
      }
    ]
  }
]

I managed to do this using this aggregation:

db.test.aggregate([
   // Filter for only payment.created and payment.completed events
   {$addFields: {
      lifeCycleInfo: {
         $filter: {
            input: '$lifeCycleInfo',
            cond: {$or: [
               {$eq: ['$$this.eventSubType', 'payment.created']},
               {$eq: ['$$this.eventSubType', 'payment.completed']}
            ]}
         }
      }
   }},
   // Sort the lifeCycleInfo array based on timestamp
   {$addFields: {
      lifeCycleInfo: {
         $sortArray: {
            input: '$lifeCycleInfo',
            sortBy: {timestamp: 1}
         }
      }
   }},
   // Match documents where payment.completed event comes before payment.created event
   {$match: {
      'lifeCycleInfo.0.eventSubType': 'payment.completed',
      'lifeCycleInfo.1.eventSubType': 'payment.created'
   }}
])

which outputs only the document with _id: 1.

The pipeline uses:

  • $filter to filter only the event of interest, in this case only payment.created and payment.completed events
  • $sortArray to sort the events based on the timestamp
  • $match to pick documents where payment.completed comes before payment.created

Note:

  • I assumed a lot of things in your example documents, but the general workflow should work. Please modify the example accordingly.
  • This query will do a collection scan since it needs to reach into the array to do its processing, so please be aware of performance considerations.
  • If this is not what you’re looking for, please provide some example documents that matches and doesn’t match the criteria, along with the expected output.

Best regards
Kevin

3 Likes

Thank you for responding. I am getting "Unrecognized expression ‘$sortArray’ error.
Note that I am trying this in compass version 1.21.2.

Hi @Suraj_Jaldu

$sortArray was added in MongoDB 5.2, so I would recommend you to upgrade to MongoDB 6 series (6.0.4 is the latest) to get this feature. Note that this is a server feature, so it’s not related to Compass.

Without $sortArray, you can still do this although the workflow will be more complex, by using:

db.test.aggregate([
   // Filter for only payment.created and payment.completed events
   {$addFields: {
      lifeCycleInfo: {
         $filter: {
            input: '$lifeCycleInfo',
            cond: {$or: [
               {$eq: ['$$this.eventSubType', 'payment.created']},
               {$eq: ['$$this.eventSubType', 'payment.completed']}
            ]}
         }
      }
   }},
   // Sort the lifeCycleInfo array based on timestamp
   {$unwind: '$lifeCycleInfo'},
   {$sort: {_id: 1, 'lifeCycleInfo.timestamp': 1}},
   {$group: {
      _id: '$_id',
      lifeCycleInfo: {$push: '$lifeCycleInfo'}
   }},
   // Match documents where payment.completed event comes before payment.created event
   {$match: {
      'lifeCycleInfo.0.eventSubType': 'payment.completed',
      'lifeCycleInfo.1.eventSubType': 'payment.created'
   }}
])

so the single $sortArray stage becomes 3 stages. Note that this is not very tested and is just a general idea on how the aggregation would look like. Please test with your data and modify accordingly to suit your actual documents.

Best regards
Kevin

3 Likes

Thanks Kevin. It was helpful.

1 Like