Aggregation to get count and Y sample entries

Trying out aggregation on data in a collection. Example data:

 {
        "_id" : "244",
        "pubName" : "p1",
        "serviceIdRef" : "36e9c779-7865-4b74-a30b-e4d6a0cc5295",
        "serviceName" : "my-service",
        "subName" : "c1",
        "pubState" : "INVITED"
    }

I would like to:

  1. Do a match by something (let’s say subName) and and group by serviceIdRef and then limit to return X entries
  2. Also return for each of the serviceIdRefs, the count of the documents in each of ACTIVE or INVITED states. And Y (for this example, say Y=3) documents that are in this state.

For example, the output would appear as (in brief):

   [
    {
        serviceIdRef: "36e9c779-7865-4b74-a30b-e4d6a0cc5295",
        serviceName:
        state:[
            {
                pubState: "INVITED"   
                count: 200
                sample: [ // Get those Y entries (here Y=3)
                    {
                        // sample1 like:
                        "_id" : "244",
                        "pubName" : "p1",
                        "serviceIdRef" : "36e9c779-7865-4b74-a30b-e4d6a0cc5295",
                        "serviceName" : "my-service",
                        "subName" : "c1",
                        "pubState" : "INVITED"

                    },
                    {
                        sample2
                    },
                    {
                        sample3
                    }
                ]
            },
            {
                pubState: "ACTIVE",
                ......
            }
        ]
    }
    {
        repeat for another service
    }
]

So far I have written this but am not able to get those Y entries. Is there another (better) way?

This is what I have so far:


db.sub.aggregate(
    [{
        $match:
        {
            "subName": {
                $in: ["c1", "c2"]
    
            },
            
            "$or": [
                {
                    "pubState": "INVITED",
                },
                {
                    "pubState": "ACTIVE",
                }
            ]
        }
    },
    {
        $group: {
            _id: "$serviceIdRef",
            subs: {
                $push: "$$ROOT",
    
            }
    
        }
    },
    {
        $sort: {
            _id: -1,
        }
    },
    {
        $limit: 22
    },
    {
       $facet:
        {
            facet1: [
                {
                    $unwind: "$subs",
                },
                {
                    $group:
                    {
                        _id: {
                            "serviceName" : "$_id",
                            "pubState": "$subs.pubState",
                            "subState": "$subs.subsState"
                        },
                        count: {
                            $sum: 1
                        }
                            
                    }
                }
            ]
        }
    }
    
    ])
    

Using DB version: 4.2.17

I’m not sure why you would need $facet - looks like you should be doing $lookup back into the collection to get total counts based on whatever field value, though I’m not sure why you wouldn’t just be doing the counting in the original $group… You’re already getting all the documents being grouped.

Your sample output document isn’t really clear to me - it’s not a valid document. Can you explain or show on a simplified document what you want the final result to look like? If you want "for each unique service+state combination give me count, you should do that inside original $group for sure but group by both service and state. Then for “sample” I’d probably recommend adding a $lookup back into the collection with $limit

Asya