Performance of $in vs regex for several hundred values

My task is to query a rather big dataset (4M+ entries) for several hundred matching values.
The queried field is indexed and preliminary profiling on a smaller dataset (100k entries) has shown that the query is quite performant (few ms).
Currently, my query is setup like this:

db.collection.findMany({ 'QUERY_FIELD', { $in: ['a', 'b', 'c'] } })

Where QUERY_FIELD is the indexed field and a, b, c, … are the searched values. Up to 1.000 searched values are expected for a single query.

The documentation for $in says:

It is recommended that you limit the number of parameters passed to the $in operator to tens of values. Using hundreds of parameters or more can negatively impact query performance.

So now I am wondering if query performance will degrade when moving to the larger dataset (it’s not available to me, yet).

Another approach I know of is using a regex query with the values combined with |. (e.g: db.collection.findMany({ 'QUERY_FIELD', /^(a|b|c)/ })).
However, it feels awkward building a regex string with hundreds of OR conditions. Additionally, I would expect that approach to perform worse than $in, because mongodb has to parse the (very long) regex string before performing the query.

Can you suggest another approach to my task?

My gut feeling is that regex will be slower. But I am pretty sure if you have access to your data you may make a copy and test.

Are the values strings or numbers?

Can you share a little bit about your use case? I am always worry when a query requires such a big number of parameters. I feel there might be some flaw in the data schema or in the data access pattern. Except of course it if is a one time query. But for one time query you should not worry about performance.

I am pretty sure it is not a human typing the 1000 searched values. Where do they come from? Are they the result of another query? Are they coming from another system? Could you permanently tag documents with those values with a field that will tie them together so your query becomes a single value (the tag) search?

If the search values come from another query, could you use aggregation with $lookup rather than retrieving 1000 values and the use the 1000 values in the findMany? About findMany, which driver has the method findMany() rather than find()?

Querying 1000 values could result in 1000 documents and many more, what do you with so many documents? Could you use aggregation to do what ever computation you intend to do?