Hi, I need to create a query which will return documents where all the elements in an array must be present in an array field.
I know how to do so using aggregation and the $all operator, I’m just nervous about performance.
Both my input array and the array on the documents can have up to 256 elements, each being a 6-character long string (HTML color hex codes). I currently have over 2000 documents, but
keep adding more, could easily reach 10s of thousands some day. Seems like this could easily cost millions of operations to find a match (though i dont have much knowledge of how mongo works under the hood).
My questions:
Is the performance of this query something I should worry about?
Should I find another solution such as keeping all data of all my documents in memory?
If I do implement it, should I avoid doing it on page requests and only use it on the back end to cache things?
Could indexing the field help? Would that be a bad idea on a field like this?
You should always worry about performance. But you should not spend time optimizing before you have performance issue. Make it right then make if fast.
The server is doing that for you as it tries to keep the working set in memory.
Do not complicate your code with early optimization.
Indexing fields used in queries always help. If you need that field, and that field is a frequent use-case, then yes index it. You may always remove the index later if you find it detrimental to your other use-cases.
Strings take more space and are slower to compare (a string a compared character per character while a number is compared in 1 operation). In your case I would keep the colours as a number. An hex code is a number after all.
One smart technique I never thought before, presented in Atlas Search to return all the applicable filters for a given search query without specifying - #2 by Erik_Hatcher in a different context might be used here. You could, for example, have frequent colour schemes (the facet_attributes) represented by a single number, an _id in another collection. So you would query with $all in a much smaller table and then $lookup using the single number in the huge collection.