Grouping efficiently with limit

I have a data set of 500k docs in a collection. I am trying to group the data based on an itemId and display data in a paginated form.
This is the pipeline I am trying.

{
    "$group": {
        "_id": "$itemId",
        "products": { $push: "$ROOT" }
    }
}
{
   "$limit": 10
}

The issue here is that the grouping stage will group all 500k docs before only returning 10 docs. Is there a way to limit the grouping to just 10 docs and returning the data.

That is a very interesting problem for which I do not have a very good solution. But it is a starting point.

name_of_collection = "the_name_of_your_collection" 
pipeline = [
  { "$limit" : 10 } ,
  { "$group" : {
    "_id" : "$itemId" ,
    "itemId" : { "$first" : "$itemId" }
  } }
  { "$lookup" : {
    "from" : name_of_collection ,
    "as" : "products" ,
    "localField" : "itemId" ,
    "foreignField" : "itemId"
  } }
]
db.getCollection( name_of_collection ).aggregate( pipeline )

As stated, it is not a very good solution but only a starting point, because you may end up with 1 to 10 documents in the result set. You will get only 1 document if the first 10 documents have the same itemId.
You may use $sample, rather than $limit to increase the odds of getting 10 different idemId.
Them, while I am ashamed of writing that, you may then call the aggregation multiple times until you get 10 different itemId.

Since you $lookup with itemId, you definitively want an index.

While I write I think. So here another idea that came to me that is probably much better that the one above, but since I spent time writing the above I want to keep it.

Create a materialized view that keeps a list of unique itemId. Start the aggregation on the materialized view, do the limit and group like above. The materialized view will ensure you always get 10 documents.

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.