How to optimize $lookup when only one of two fields (notificationId or schedulerId) exists per document?

M_F · April 25, 2025, 7:33am

Hi everyone,

In my collection, each document contains either a notificationId or a schedulerId — never both. I want to perform a $lookup based on whichever field exists to pull data from either the notifications or schedulers collection.

My concern is:
If I write an aggregation pipeline that includes both $lookup stages (one for each field), will the pipeline still run both lookups for every document — even if one of the fields doesn’t exist? Could that affect performance on large datasets?

Is there a better way to structure the aggregation so that only the necessary $lookup is performed based on the field that exists?

aneroid · April 25, 2025, 8:59am

Yes, it will. Even if there’s no match, the lookup has to occur to establish that there’s no match.

Worth measuring and based on how large. If you’re looking up on the _id field, it will still be quite fast.

To improve performance, you can use $facet or $unionWith; both with a $match operation for the fields.

If there’s a group operation, then facet works well. But there’s a 100MB per stage limit and the standard document limit of 16 MB for the final result . So if you want full/all documents, then you will need unionWith.

Do you have some example data?

M_F · April 25, 2025, 10:46am

Yes, but won’t it just provide more overhead, since both don’t utilize indexes.

M_F · April 25, 2025, 10:46am

[{
  "_id": "6808990bb8324c2b80635375",
  "notificationId": null,
  "schedulerId": "68075d9ab426014f20df5f56"
},
{
  "_id": "6808a1c8fb203039b0f293b0",
  "notificationId": "68075d9ab426014f20df5f56",
  "schedulerId": null
}]

Here is the example data.

As you can see, one document has notificationId as null and the other as schedulerId as null.

aneroid · April 25, 2025, 12:18pm

It will, but you’ll need to choose which one makes more sense to have:
a) Lookup both collections
b) Pre-filter before two separate lookups.

Consider that you have 1,000 docs with a 50-50 split between the two fields. Directly doing the Lookup twice means 2,000 lookups, which you have already said you want to avoid. The other way filters 1000 to 500 (twice), so is 500 + 500 lookups. To improve the two match stages, you can also add a partial index for those two fields with $exists: true.

One null and other not-null is different from “only one exists” But that’s a minor change to the aggregation and partial index. In this case it would be $type: "string" or $type: "objectId" depending on what your data actually has.

If you don’t have or can’t create partial indexes on those two fields but your lookup for notificationId and schedulerId is with the _id field of the other collections, then it would use that unique index during the lookup. And that could make the 2,000 lookups faster than the twice “500 $match on non-indexed field & lookup on indexed _id field”. You’ll have to try both options to see where the cutover occurs wrt number of documents.

If your foreignField in the lookup for notifications and schedulers collection is not an indexed field, then it’s very likely that filtering twice before two lookups will be faster for non-trivially sized collections.

M_F · April 25, 2025, 12:38pm

Got it, thanks for the clarification! I’ll test both approaches—using partial indexes where possible and leveraging the _id field during lookups—to see which one performs better depending on the document count. I’ll focus on filtering first if the foreign fields aren’t indexed.

Thanks.

aneroid · April 25, 2025, 3:41pm

Adding the twice-match-lookup method here, using unionWith:

Filter on one of the fields, do the lookup
Union with:
Filter on the other field, do the lookup
Add unwind, sort, etc. as needed. I’ve put just one sort at the end.

db.collection.aggregate([
  // the `$type: "string"` implies not null in the data
  // use `$type: "objectId"` if that's the data type in the actual docs
  { $match: { notificationId: { $type: "string" } } },
  {
    $lookup: {
      from: "notifications",
      localField: "notificationId",
      foreignField: "_id",
      as: "notification"
    }
  },
  {
    $unionWith: {
      coll: "collection",
      pipeline: [
        { $match: { schedulerId: { $type: "string" } } },
        {
          $lookup: {
            from: "schedulers",
            localField: "schedulerId",
            foreignField: "_id",
            as: "schedule"
          }
        }
      ]
    }
  },
  { $sort: { _id: 1 } }
])

Mongo Playground with Example data and Results. Btw, I’ve added one doc which has both fields just to show that the results would be odd if such a doc existed. (But you have specified that this won’t occur.)

M_F · April 28, 2025, 7:55am

Amazing, thanks for the example.