How to scale and optimize this friend suggestions query?

My app has a performance intensive query, which generates a list of suggested friends for any user that logs into the App.

This query currently is slow despite indexes. The $lookup, which is a join, is slow despite being indexed.

Currently, this query is slow with a database of more than 100,000 users. I need to scale my APP to potentially 10s of millions of users.

questions:

  1. Does this look like the right approach?
  2. What is needed to speed it up? Sharding? Faster Hardware? Persist in RAM?
  3. There are appears to be no way to remove the $lookup / join. Or are there?

The business logic of the query goes like so:

const suggested_friends = await all_users.aggregate([
  {
    $match: {
      age: { $gt: 18 },
      city: { $in: ["chicago", "new york"] },
      school: { $in: ["harvard", "standford"] },
      // etc., potentially hundreds of other arbitrary filters, which are different for each query and undefined
    },
  },
  // must lookup and remove users that this user has blocked and who have blocked this user.
  {
    $lookup: {
      from: "block_users",
      let: { tar_id: "$_id", tar_tid: "$tar._id" }, // joining by _id, which is indexed
      pipeline: [
        {
          $match: {
            $expr: {
              $or: [
                {
                  $and: [
                    {
                      $eq: ["$blocker", querier_user_id],
                    },
                    { $eq: ["$blocked", "$$tar_id"] },
                  ],
                },
                {
                  $and: [
                    { $eq: ["$blocker", "$$tar_id"] },
                    {
                      $eq: ["$blocked", querier_user_id],
                    },
                  ],
                },
              ],
            },
          },
        },
        { $limit: 1 },
      ],
      as: "remove",
    },
  },
  {
    $match: {
      $expr: {
        $eq: [{ $size: "$remove" }, 0],
      },
    },
  },
  {
    $project: {
      remove: 0,
    },
  },

  // now to derive some similarity score to predict whether user will be good friend
  {
    $set: {
      similarity_score: complex_function_call(), // some complex function to compute a number
    },
  },

  // finally sorting
  {
    $sort: {
      similarity_score: -1,
      age: 1,
      grade: 1,
      distance: 1,
      // etc. the sort order, by what field, and how many fields is not predefined and can be different for each query
      // and thus, cannot use index.
      // also, similarity_score must be derived, hence another reason an index cannot be used.
      // this must come at the end because blocked users must be filtered and removed first.
    },
  },
  {
    $limit: 100,
  },
]);

Couple of thoughts…

You’re running this on every logon? What’s more often to happen, a user log on or a user sign up / change key information. If it’s logon then there is no point re-calculating the friends list every time, calculate it only as needed if it’s expensive.

Store the meta information as attributes, which would make indexing the fields easier / more efficient.

Don’t keep blocking / blocked user data in a different collection, store it with the users as that’s a place it’s being actively used, you can re-calculate a users friends suggestion list when a user blocks / is blocked.
(with the assumption that if it grows to massive you’ve the 16MB limit, but that’s a VERY disliked user, you could always have a flag to indicate a user like this and process them differently…)

Don’t calculate this in real-time, when a change is made, put the user id on a queue and have a background process calculate this at some point, but not immediately, I assume it’s not critical that friends matches are calculated the second that a user logs in.