Speed performance: $in vs $lookup with indexed field, _id

Assumption:

  • All data and indexes are in RAM.

First query:

{
  $in: [user_id, array_of_user_ids_with_length_1000]
}

Second query:

{
  $lookup: {
    from: "col_b",
    localField: "_id", 
    foreignField: "user_id",
    as: "joined_field"
  }
}

where col_b has 10B documents and is indexed, uniquely, on "user_id".

My reasoning:

The second query should be faster because the $in operation in the first query will have worst case time complexity of O(1000), while the worst case time complexity of the second query is O(log(10B)) = O(23).

Is my reason correct? Will the second query indeed be faster than the first? If not, what did I get wrong? What about in practice, can we expect this to be case in real data, hardware, RAM, use case, etc.?

1 Like

This is an excellent question. I hope someone can provide an answer,

Since user_id is indexed and $in compares each value of array (length of 1000) to each document in the database, The time will be O(1000 * log(10B)) = O(23000).

I am not sure how $lookup exactly joins two collections, but In my knowledge and considering that localField and foreignField are index(b-tree), then both collections are sorted on index. The complexity looks like, it will be O(log(10B)).