Using Motor in Python, I have a Collection that has 10M documents, with 4 fields: sender, receiver, amount and height.
I’m trying to fetch documents where either sender or receiver matches a predefined string. As I need to use pagination, I need the total count as well.
Current pipeline:
pipeline = [
{
"$match": {
"$or": [
{"sender": {"$eq": account_string}},
{"receiver": {"$eq": account_string}},
]
}
},
{"$sort": {"height": DESCENDING}},
{"$project": {"_id": 1}},
{
"$facet": {
"metadata": [{"$count": "total"}],
"data": [{"$skip": skip}, {"$limit": 20}],
}
},
{
"$project": {
"data": 1,
"total": {"$arrayElemAt": ["$metadata.total", 0]},
}
},
]
My collection has indices on sender and receiver.
I execute this pipeline using:
result = (
await db.my_collection
.aggregate(pipeline)
.to_list(20)
)
This executes at very acceptable speeds to almost all values of account_string. However, there is 1 account_string that has 9M out of 10M matches. For this account_string, execution time nears 20 sec. What’s worse, if I paginate (increase/decrease the skip amount), I again have to wait 20 sec.
I believe the slowness is due to the fact that it needs to read the entire collection to calculate the count?
My question: is there a quicker way to achieve paginated results?