Assumption:
- All data and indexes are in RAM.
First query:
{
$in: [user_id, array_of_user_ids_with_length_1000]
}
Second query:
{
$lookup: {
from: "col_b",
localField: "_id",
foreignField: "user_id",
as: "joined_field"
}
}
where col_b has 10B documents and is indexed, uniquely, on "user_id".
My reasoning:
The second query should be faster because the $in
operation in the first query will have worst case time complexity of O(1000), while the worst case time complexity of the second query is O(log(10B)) = O(23).
Is my reason correct? Will the second query indeed be faster than the first? If not, what did I get wrong? What about in practice, can we expect this to be case in real data, hardware, RAM, use case, etc.?