I have a data set of 500k docs in a collection. I am trying to group the data based on an itemId and display data in a paginated form.
This is the pipeline I am trying.
The issue here is that the grouping stage will group all 500k docs before only returning 10 docs. Is there a way to limit the grouping to just 10 docs and returning the data.
As stated, it is not a very good solution but only a starting point, because you may end up with 1 to 10 documents in the result set. You will get only 1 document if the first 10 documents have the same itemId.
You may use $sample, rather than $limit to increase the odds of getting 10 different idemId.
Them, while I am ashamed of writing that, you may then call the aggregation multiple times until you get 10 different itemId.
Since you $lookup with itemId, you definitively want an index.
While I write I think. So here another idea that came to me that is probably much better that the one above, but since I spent time writing the above I want to keep it.
Create a materialized view that keeps a list of unique itemId. Start the aggregation on the materialized view, do the limit and group like above. The materialized view will ensure you always get 10 documents.