Rolling up documents by identifiers?

I have three collections: “widgets”, “users”, and “accounts”. The documents within each look like:

accounts:

[
  {
  	"_id": 1,
  	"name": "account_1",
  	"account": "abc123"
	},
  {
    "_id": 2,
    "name": "account_2",
    "account": "def456"
  }
]

users:

[
  {
    "_id": 1,
    "username": "jsmith",
    "accounts": ["abc123", "def456"]
  }
]

widgets:

[
  {
    "_id": 1,
    "name": "widget_1",
    "user_id": 1,
    "account_id": 1,
    "expires": "2022-11-30T00:00:00.000-05:00"
  },
  {
    "_id": 2,
    "name": "widget_2",
    "user_id": 1,
    "account_id": 2,
    "expires": "2022-12-31T14:00:00.000-05:00"
  }
]

Notice that a single “user” can be a member of multiple “accounts”, and that a single user can have multiple “widgets”. My goal is to write a query such that results look like:

[
  {
    "_id": 1,
    "username": "jsmith",
    "accounts": [
      {
        "_id": 1,
        "name": "account_1",
        "account": "abc123"
      }
    ],
    "widgets": [
      {
        "_id": 1,
        "name": "widget_1",
        "expires": "2022-11-30T00:00:00.000-05:00"
      }
    ]
  }
]

In other words the query:

  1. Finds all users that have widgets expiring on a certain date (2022-11-30 in the example)
  2. Matches widgets and users by _id, but accounts by the account attribute
  3. Rolls the widgets documents into the returned user document
  4. Rolls the accounts associated with the found widgets into the returned user document

I have successfully written an aggregate query that finds all widgets expiring on a certain date and rolls the associated user and account documents into the token document. But it’s not clear to me if it will be possible to do as I have described above. I think this involves the pipeline option on the $lookup function, but I am not sure.

For reference, the query I wrote (that I am trying to flip) which returns tokens with rolled-up documents is:

db.getCollection("widgets").aggregate([
    {$match: {
        expires: {
            $lt: ISODate("2022-12-01T20:00:00.000-05:00"),
            $gt: ISODate("2022-11-29T18:59:59.999-05:00")
        }
    }},
    {$lookup: {
        from: "users",
        localField: "user_id",
        foreignField: "_id",
        as: "user"
    }},
    {$lookup: {
        from: "accounts",
        localField: "account_id",
        foreignField: "_id",
        as: "account"
    }},
    {$unwind: "$user"},
    {$unwind: "$account"}
])

Note: my target database version is 4.4.

Thanks for publishing your sample documents, expected results and pipeline.

Next time vary your _id from one collection to the others, it helps to see which _id are which in your expected results. Ex: accounts _id could be 101, 102, users _id could be 201 and widgets could be 301, 302.

It looks like your pipeline supplies all the correct information and you just want it to be reorganized. Personally, I prefer to do this reorganization on the application back end or front end rather than on the data server. Specially with $unwind as it increases the amount of data.

You may get an output close to your expected result with a $group stage that would look like the following untested code:

{ "$group" : {
  "_id" : "$user_id" ,
  "username" : { "$first" : "$user.username" } ,
  "accounts" : { "$addToSet" : "$account" } ,
  "widgets" : { "$addToSet" : { "_id" : "$_id" , "name" : "$name" , "expires" : "$expires" } }
} }

Thank you for the reply and suggestion. What I ended up doing was… a lot:

db.getCollection("users").aggregate([
    // Get all user documents.
    {$match: {
        _id: { $exists: true }
    }},
    
    // Join all widget documents on user._id => widget.user_id.
    {$lookup: {
        from: "widgets",
        localField: "_id",
        foreignField: "user_id",
        as: "widgets"
    }},
    
    // Return individual result documents for each joined widget.
    {$unwind: "$widgets"},
    
    // Replace the "widgets" array field in each document with a singlar widget field.
    {$project: {
        _id: true,
        email: true,
        username: true,
        widget: {
            _id: "$widgets.id",
            name: "$widgets.name",
            account_id: "$widgets.account_id",
            expires: "$widgets.expires"
        }
    }},
    
    // Filter results by widgets that expire on a given day.
    {$match: {
        "widget.expires": {
            $gt: ISODate('2022-11-14T23:59:59.999-05:00'),
            $lt: ISODate('2022-11-16T00:00:00.000-05:00')
        }
    }},
    
    // Get the account associated with each widget.
    {$lookup: {
        from: "accounts",
        localField: "widget.account_id",
        foreignField: "_id",
        as: "widget.account"
    }},
    
    // Each widget is for a singluar account, so use $unwind to
    // replace the array field with a singular field.
    {$unwind: "$widget.account"},
    
    {$match: {
        "widget.account.account": {$ne: null}
    }},
    
    
    // Remove unwanted fields from the result set.
    {$project: {
        widget: {
            account_id: false,
            account: {
                created: false,
                externalid: false,
                meta: false,
                plan: false
            }
        }
    }},
    
    // Undo the "$widgets" unwind by grouping all documents by the user doc fields.
    {$group: {
        _id: { _id: "$_id", email: "$email", username: "$username" },
        widgets: {$push: "$widget"}
    }},
    
    // Remap the group results into documents that make a bit more sense.
    {$project: {
        _id: "$_id._id",
        email: "$_id.email",
        username: "$_id.username",
        widgets: "$widgets"
    }}
])
1 Like

One simple improvement to try in your solution.

You may add a pipeline inside your $lookup:{from:widgets} to do your match on widget.expires before $unwind. See https://www.mongodb.com/docs/manual/reference/operator/aggregation/lookup/#correlated-subqueries-using-concise-syntax

I’ve read that document several times and I’m just not getting it. I think I tried to do that, and was getting “cannot use ‘$’ paths in field names” or something. Anyway, as I have it written, it’s easy to adjust the filters in my application by working with the individual clause objects.

1 Like