Hello, I am struggling with finding the most optimal way to get top ten recent unique values.
I have a large collection of documents with such structure:
{
"timestamp": 1673450840197,
"request": "new laptop models",
"user": "user1"
}
I want to get the top 10 recent unique requests using such aggregation pipeline:
{
$match: {
user: "user1"
}
},
{
$sort: {timestamp: -1}
},
{
$group: {
_id: "$request",
timestamp: { $first: "$timestamp" }
}
},
{
$limit: 10
}
I have index {user: 1, timestamp: -1, request: 1}
.
This query is very slow when there are a lot of different requests by user, because the whole index is scanned… I have read about DISTINCT_SCAN
optimization, but I don’t know how to apply it to my query…
I tried to use index {user: 1, request: 1, timestamp: -1}
and change $sort
stage to $sort: {request: 1, timestamp: -1}
, then DISTINCT_SCAN
is applied, but the data is not sorted like I wanted in my first approach - the most recent request should be first in the result.
Could you please tell me whether I am missing something or maybe I should completely change they way of solving this task?