Apply `$lookup` on array documents which are output of `$group` stage

I’m having trouble applying a $lookup stage after a $group stage in my aggregation pipeline. The grouping stage produces an array($push: $$ROOT) as output, which seems to prevent me from using the $lookup stage afterwards.

Does anyone know how I can apply a $lookup stage in each document provided in the array output from $group stage? Any help or advice would be greatly appreciated. Thanks!

MongoDB Version: 4.4

Example Order Documents:

{
  _id: ObjectId("64002be8a5f7df1355bc95d0"),
  status: "New",
  products: [
    ObjectId("63183f9492a483955009c43a"),
    ObjectId("63183f9492a483955009c43b")
  ]
}
{
  _id: ObjectId("64002be8a5f7df1355bc95d1"),
  status: "In_Progress",
  products: [
    ObjectId("63183f9492a483955009c43a"),
    ObjectId("63183f9492a483955009c43c")
  ]
}
{
  _id: ObjectId("64002be8a5f7df1355bc95d2"),
  status: "Complete",
  products: [
    ObjectId("63183f9492a483955009c43b"),
    ObjectId("63183f9492a483955009c43c")
  ]
}

Example Product Documents:

{
  _id: ObjectId("63183f9492a483955009c43a"),
  name: "Brush",
  category: "Household",
  price: 100
}
{
  _id: ObjectId("63183f9492a483955009c43b"),
  name: "Pan",
  category: "Kitchen",
  price: 145
}
{
  _id: ObjectId("63183f9492a483955009c43c"),
  name: "Bag",
  category: "Household",
  price: 50
}

Trying to achieve :
Sort documents → Group them by status → Limit first n documents → Apply lookup to fetch product details → return results

Desired results:

{
    "New": [
        {
            "_id": ObjectId("64002be8a5f7df1355bc95d0"),
            "products": [
                {
                    "_id": ObjectId("63183f9492a483955009c43a"),
                    "name": "Brush",
                    "category": "Household",
                    "price": 100
                },
                {
                    "_id": ObjectId("63183f9492a483955009c43b"),
                    "name": "Pan",
                    "category": "Kitchen",
                    "price": 145
                }
            ]
        }
    ],
    "In_Progress": [{...},{...}....],
    "Complete": [{...},{...}....]
}

Please share the pipeline.

It’s too bad you’re on 4.4 and not 6.0 (latest) because the “top N” of each group would be a lot easier with new $topN accumulator.

Meanwhile, when you have an array and want to look up details of a specific field, just use "array.fieldId" as the localField and it should “just work”. Now, it will return details into the new field you specify in "as" but you can then use a trick described here to merge them.

Asya

{"$sort": {"creation_time": -1}},
{"$group": {
    "_id": "status",
    "$push": "$$ROOT"
}},
{"$project":
    {
        "new": {"$slice": ["$New", skip, limit]},
        "In_Progress": {"$slice": ["$In_Progress", skip, limit]},
        "Complete": {"$slice": ["$Complete", skip, limit]},
    }
},
{
    "$lookup": {
        "from": "product",
        "let": {"products": "$products"},
        "pipeline": [
            {"$match": {"$expr": {"$in": ["$_id", "$$products"]}}},
            *lookup_to_fetch_product_details(),
        ],
        "as": "products",
    }
}

None of your sample documents have a field named creation_time yet you $sort on it?

It is really hard to supply a real solution when we do not have real documents to work with.

The following

tells me that the status field has a finite set of values. If that is the case then you might be better off forging the $group and use $facet like:

{ "$facet" : {
   "New" : pipeline_for__New ,
   "In_Progress" : pipeline_for__In_Progress ,
   "Complete" : pipeline_for__Complete
} }

Each pipeline will have the same structure:

{ "$lookup" : {
   "from" : "Orders" ,
   "as" : "Orders" ,
   "pipeline" : [
        { "$sort" : { "creation_time" : -1 } } ,
        { "$match" : { "status" : "New" } } ,
        { "$skip" : skip } ,
        { "$limit" : limit } ,
        { "$lookup" : {
            "from" : Products" ,
            "localField" : "products" , /* As mentioned by Asya you do not need to $unwind */
            "foreignField" : "_id" ,
            "as" : "products"
        } }
   ]
} }
1 Like

I got an idea where you could have a single pipeline rather than 1 pipeline per status name.

Rather than $facet as the first stage you could have

{ "$documents" : [
    { "_id" : "New" },
    { "_id" : "In_Progress" },
    { "_id" : "Complete" }
] }

The next stage, the single $lookup will look like:

{ "$lookup" : {
   "from" : "Orders" ,
   "localField" : "_id" ,
   "foreignField" : "status" ,
   "as" : "Orders" ,
   "pipeline" : [
        { "$sort" : { "creation_time" : -1 } } ,
        { "$skip" : skip } ,
        { "$limit" : limit } ,
        { "$lookup" : {
            "from" : Products" ,
            "localField" : "products" ,
            "foreignField" : "_id" ,
            "as" : "products"
        } }
   ]
} }

Alternatively to $documents, you could use

{ "$group" : { "_id" : "status" } }

To get the list of distinct statuses dynamically. Should be fast if there is an index on “status” as it will lead to a winningPlan of PROJECTION_COVERED with a DISTINCT_SCAN on the index.

The more I think about it the more I think that a better idea would be to have a collection of possible values for your status and other fields with finite values (for example your category). You would then start the aggregation on this “dictionary” collection. Something like

{  "field" : "status" ,
   "value" : "New" 
} ,
{  "field" : "status" ,
   "value" : "Complete" 
} ,
{  "field" : "status" ,
   "value" : "In_Progress" 
} ,
{  "field" : "category" ,
   "value" : "Household" 
} ,
{  "field" : "category" ,
   "value" : "Kitchen" 
} ,

The aggregation would then start on this “dictionary” like:

db.dictionary.aggregate( [
    { "$match" : { "field" : "status" } } ,
    { "$lookup" : {
        "from" : "Orders" ,
        "localField" : "value" ,
        "foreignField" : "status" ,
        /* like above */
    } }
] )

The issue I see with the $group:{_id:status} version is that if you have no orders In_Progress, the you get no document at all. With $documents and “dictionary”, you will get 1 document with an empty array which might be easier (or not) to handle in your code.

The issue I see with the $facet and $documents version is that you need to change the code if you had a new possible value for status. Both $group:{_id:status} and “dictionary” versions are completely data driven.

2 Likes

Thank You.
We’ll try it our once we are in the latest 6.0 version.

You could combine the two methods :slight_smile: