I have mongo record that looks like this
and assume $$NOW
is 2023-01-29T00:30:48.370+00:00
[
{
"signedUpAt": new Date("2023-01-29T00:30:48.370+00:00"),
"loggedInAt": new Date("2023-01-29T00:30:48.370+00:00"),
id: 1
},
{
"loggedInAt": new Date("2023-01-29T00:30:48.370+00:00"),
"signedUpAt": new Date("2023-01-29T00:30:48.370+00:00"),
id: 2
},
{
"loggedInAt": new Date("2023-01-29T00:30:48.370+00:00"),
"signedUpAt": new Date("2023-01-29T00:30:48.370+00:00"),
id: 3
},
{
"loggedInAt": new Date("2023-01-27T00:30:48.370+00:00"),
"signedUpAt": new Date("2023-01-01T00:30:48.370+00:00"),
id: 4
},
{
"loggedInAt": new Date("2023-01-00T00:30:48.370+00:00"),
"signedUpAt": new Date("2023-01-01T00:30:48.370+00:00"),
id: 5
},
{
"loggedInAt": new Date("2023-01-01T00:30:48.370+00:00"),
"signedUpAt": new Date("2023-01-01T00:30:48.370+00:00"),
id: 6
},
{
"loggedInAt": new Date("2022-01-01T00:30:48.370+00:00"),
"signedUpAt": new Date("2022-01-01T00:30:48.370+00:00"),
id: 7
}
]
I want to get maximum 3(or any number) results from the above data, with following priority
-
randomly selected Users who signed up < 24 hrs ago, maximum of 2 records
-
randomly select remaining(3-users returned from condition 1) Users who logged in within the past 7 days and signed up >24 hours ago
-
randomly select remaining(3-users returned from condition 1+ 4) Users who logged in within the past 8 to 15 days
-
Any other random user(first 3 conditions still dint return 3 results)
condition 3 & 4 should only return data(or even run?) if condition 1 & 2 hasn’t returned 3 records yet.
one possible expected result
[
{
"signedUpAt": new Date("2023-01-29T00:30:48.370+00:00"),
"loggedInAt": new Date("2023-01-29T00:30:48.370+00:00"),
id: 1
}, // signed up < 24hrs
{
"loggedInAt": new Date("2023-01-29T00:30:48.370+00:00"),
"signedUpAt": new Date("2023-01-29T00:30:48.370+00:00"),
id: 3
}, // signed up < 24hrs
{
"loggedInAt": new Date("2023-01-27T00:30:48.370+00:00"),
"signedUpAt": new Date("2023-01-01T00:30:48.370+00:00"),
id: 4
} // logged in last 7 days & signed up > 24 hrs ago
]
I tried below query
db.collection.aggregate([
{
"$addFields": {
"randSortKey": {
"$rand": {}
},
signedUpHrs: {
$dateDiff: {
startDate: "$signedUpAt",
endDate: new Date("2023-01-29T00:30:48.370+00:00"),
unit: "hour"
}
},
loggedInDaysAgo: {
$dateDiff: {
startDate: "$loggedInAt",
endDate: new Date("2023-01-29T00:30:48.370+00:00"),
unit: "day"
}
}
}
},
{
"$addFields": {
partition: {
$switch: {
branches: [
{
case: {
$lte: [
"$signedUpHrs",
24
]
},
then: -1
},
{
case: {
$lte: [
"$loggedInDaysAgo",
7
]
},
then: 1
},
{
case: {
$and: [
{
$lte: [
"$loggedInDaysAgo",
15
]
}
],
},
then: 2
},
{
case: {
$and: [
{
$lte: [
"$loggedInDaysAgo",
30
]
}
],
},
then: 3
},
],
default: 4
}
}
}
},
{
"$setWindowFields": {
"partitionBy": "$partition",
"sortBy": {
"randSortKey": 1
},
"output": {
"rank": {
"$rank": {}
},
total: {
$sum: 1
}
}
}
},
{
"$match": {
$expr: {
$or: [
{
$ne: [
"$partition",
-1
]
},
{
$and: [
{
partition: -1
},
{
$lte: [
"$rank",
2
]
}
]
}
]
}
}
},
{
"$limit": 4
}
])
It gives me results.
but my question is, is there a performant way? would this query still be good if i have millions of rows? or this is the best possible way to achieve this?
mongo playground: Mongo playground