My app has a performance intensive query, which generates a list of suggested friends for any user that logs into the App.
This query currently is slow despite indexes. The $lookup
, which is a join, is slow despite being indexed.
Currently, this query is slow with a database of more than 100,000 users. I need to scale my APP to potentially 10s of millions of users.
questions:
- Does this look like the right approach?
- What is needed to speed it up? Sharding? Faster Hardware? Persist in RAM?
- There are appears to be no way to remove the
$lookup
/ join. Or are there?
The business logic of the query goes like so:
const suggested_friends = await all_users.aggregate([
{
$match: {
age: { $gt: 18 },
city: { $in: ["chicago", "new york"] },
school: { $in: ["harvard", "standford"] },
// etc., potentially hundreds of other arbitrary filters, which are different for each query and undefined
},
},
// must lookup and remove users that this user has blocked and who have blocked this user.
{
$lookup: {
from: "block_users",
let: { tar_id: "$_id", tar_tid: "$tar._id" }, // joining by _id, which is indexed
pipeline: [
{
$match: {
$expr: {
$or: [
{
$and: [
{
$eq: ["$blocker", querier_user_id],
},
{ $eq: ["$blocked", "$$tar_id"] },
],
},
{
$and: [
{ $eq: ["$blocker", "$$tar_id"] },
{
$eq: ["$blocked", querier_user_id],
},
],
},
],
},
},
},
{ $limit: 1 },
],
as: "remove",
},
},
{
$match: {
$expr: {
$eq: [{ $size: "$remove" }, 0],
},
},
},
{
$project: {
remove: 0,
},
},
// now to derive some similarity score to predict whether user will be good friend
{
$set: {
similarity_score: complex_function_call(), // some complex function to compute a number
},
},
// finally sorting
{
$sort: {
similarity_score: -1,
age: 1,
grade: 1,
distance: 1,
// etc. the sort order, by what field, and how many fields is not predefined and can be different for each query
// and thus, cannot use index.
// also, similarity_score must be derived, hence another reason an index cannot be used.
// this must come at the end because blocked users must be filtered and removed first.
},
},
{
$limit: 100,
},
]);