Speed performance: $in vs $lookup with indexed field, _id


  • All data and indexes are in RAM.

First query:

  $in: [user_id, array_of_user_ids_with_length_1000]

Second query:

  $lookup: {
    from: "col_b",
    localField: "_id", 
    foreignField: "user_id",
    as: "joined_field"

where col_b has 10B documents and is indexed, uniquely, on "user_id".

My reasoning:

The second query should be faster because the $in operation in the first query will have worst case time complexity of O(1000), while the worst case time complexity of the second query is O(log(10B)) = O(23).

Is my reason correct? Will the second query indeed be faster than the first? If not, what did I get wrong? What about in practice, can we expect this to be case in real data, hardware, RAM, use case, etc.?

1 Like

This is an excellent question. I hope someone can provide an answer,