$in with array of 20k ids

simplenotezy · April 8, 2022, 5:24pm

I am wondering what the best strategy would be. We need to query a rather large collection of currently 24m rows.

We search on 2 indexed fields, but to narrow down the query even further, we have an array of up to 10k user ids that could be added to the query. Since the existing query is already an $or query, we needed to add the array twice, resulting in totally 20k ids being transported to Atlas.

Is this a bad idea? Would be it wiser to simply omit the 20k array, and then instead filter the results once they arrive - in code?

Thanks in advance

steevej · April 8, 2022, 10:15pm

Not quite sure I understand:

Is this a one time operation or a day to day use-case?

I think that most of the time it is better to filter out in the server. Except may be, if you filter out only 5Kb of data by sending a query of 20Kb.

Rather than having the 10k ids in an array as part of the query, it should be possible to

create a temporary collection that holds the 10k ids
filter with your current query
then $lookup the _id in the temp. collection
filter out the documents that have an empty $lookup result

Then the 10k ids are sent only once to Atlas.

simplenotezy · April 9, 2022, 9:47am

Thanks for your input.

It’s a query that is done often, several times a day, thousands of times.

I also considered doing a $lookup directly on the collection, but isn’t it quite imperformant? As I could read its like a MySQL subquery, and thus should suffer from the n+1 issue, producing many small queries per document. Or did I miss something?

Basically the goal is this:

query user documents (up to a sample of 10k)
query like documents where a specific user id is in (either as the LIKED or the LIKER) - this is the expensive one

Then on the server we loop through all users and then check if that specific user is in the LIKED or LIKER collection we also fetched.

steevej · April 12, 2022, 11:34am

I have no clue about this n+1 issue.

With proper indexes $lookup shoul be okay.