Performant Mongo Query to get X users with specific criteria

I have mongo record that looks like this
and assume $$NOW is 2023-01-29T00:30:48.370+00:00

[
  {
    "signedUpAt": new Date("2023-01-29T00:30:48.370+00:00"),
    "loggedInAt": new Date("2023-01-29T00:30:48.370+00:00"),
    id: 1
  },
  {
    "loggedInAt": new Date("2023-01-29T00:30:48.370+00:00"),
    "signedUpAt": new Date("2023-01-29T00:30:48.370+00:00"),
    id: 2
  },
  {
    "loggedInAt": new Date("2023-01-29T00:30:48.370+00:00"),
    "signedUpAt": new Date("2023-01-29T00:30:48.370+00:00"),
    id: 3
  },
  {
    "loggedInAt": new Date("2023-01-27T00:30:48.370+00:00"),
    "signedUpAt": new Date("2023-01-01T00:30:48.370+00:00"),
    id: 4
  },
  {
    "loggedInAt": new Date("2023-01-00T00:30:48.370+00:00"),
    "signedUpAt": new Date("2023-01-01T00:30:48.370+00:00"),
    id: 5
  },
  {
    "loggedInAt": new Date("2023-01-01T00:30:48.370+00:00"),
    "signedUpAt": new Date("2023-01-01T00:30:48.370+00:00"),
    id: 6
  },
  {
    "loggedInAt": new Date("2022-01-01T00:30:48.370+00:00"),
    "signedUpAt": new Date("2022-01-01T00:30:48.370+00:00"),
    id: 7
  }
]

I want to get maximum 3(or any number) results from the above data, with following priority

  1. randomly selected Users who signed up < 24 hrs ago, maximum of 2 records

  2. randomly select remaining(3-users returned from condition 1) Users who logged in within the past 7 days and signed up >24 hours ago

  3. randomly select remaining(3-users returned from condition 1+ 4) Users who logged in within the past 8 to 15 days

  4. Any other random user(first 3 conditions still dint return 3 results)

condition 3 & 4 should only return data(or even run?) if condition 1 & 2 hasn’t returned 3 records yet.

one possible expected result

[
  {
    "signedUpAt": new Date("2023-01-29T00:30:48.370+00:00"),
    "loggedInAt": new Date("2023-01-29T00:30:48.370+00:00"),
    id: 1
  }, // signed up < 24hrs
  {
    "loggedInAt": new Date("2023-01-29T00:30:48.370+00:00"),
    "signedUpAt": new Date("2023-01-29T00:30:48.370+00:00"),
    id: 3
  }, // signed up < 24hrs
  {
    "loggedInAt": new Date("2023-01-27T00:30:48.370+00:00"),
    "signedUpAt": new Date("2023-01-01T00:30:48.370+00:00"),
    id: 4
  } // logged in last 7 days & signed up > 24 hrs ago
]

I tried below query

db.collection.aggregate([
  {
    "$addFields": {
      "randSortKey": {
        "$rand": {}
      },
      signedUpHrs: {
        $dateDiff: {
          startDate: "$signedUpAt",
          endDate: new Date("2023-01-29T00:30:48.370+00:00"),
          unit: "hour"
        }
      },
      loggedInDaysAgo: {
        $dateDiff: {
          startDate: "$loggedInAt",
          endDate: new Date("2023-01-29T00:30:48.370+00:00"),
          unit: "day"
        }
      }
    }
  },
  {
    "$addFields": {
      partition: {
        $switch: {
          branches: [
            {
              case: {
                $lte: [
                  "$signedUpHrs",
                  24
                ]
              },
              then: -1
            },
            {
              case: {
                $lte: [
                  "$loggedInDaysAgo",
                  7
                ]
              },
              then: 1
            },
            {
              case: {
                $and: [
                  {
                    $lte: [
                      "$loggedInDaysAgo",
                      15
                    ]
                  }
                ],
                
              },
              then: 2
            },
            {
              case: {
                $and: [
                  {
                    $lte: [
                      "$loggedInDaysAgo",
                      30
                    ]
                  }
                ],
                
              },
              then: 3
            },
            
          ],
          default: 4
        }
      }
    }
  },
  {
    "$setWindowFields": {
      "partitionBy": "$partition",
      "sortBy": {
        "randSortKey": 1
      },
      "output": {
        "rank": {
          "$rank": {}
        },
        total: {
          $sum: 1
        }
      }
    }
  },
  {
    "$match": {
      $expr: {
        $or: [
          {
            $ne: [
              "$partition",
              -1
            ]
          },
          {
            $and: [
              {
                partition: -1
              },
              {
                $lte: [
                  "$rank",
                  2
                ]
              }
            ]
          }
        ]
      }
    }
  },
  {
    "$limit": 4
  }
])

It gives me results.
but my question is, is there a performant way? would this query still be good if i have millions of rows? or this is the best possible way to achieve this?

mongo playground: Mongo playground

Assuming you have the signed up and logged in indexed you can make your query very performant. First put in a match that selects users who logged in between 8 and 14 days from the current date or signed up less than 24 hours. That will be the IXSCAN for the aggregation. What follows will be much faster than going over the entire collection.

1 Like