What is most optimal way to run a “contains” query on all fields in a collection? The search can be on a partial word or a partial phrase.
I have tried using wildcard and regex queries, but they are quite slow and do not meet the expected performance.
Have you considered using
autocomplete with an ngram?
Yes, but it doesn’t support wildcard paths. My requirement is to be able to search for a given string in any field.
Also, I have observed that wildcard queries (contains search in all fields) are slow for small (<500 docs) datasets as well. It seems that other clauses in the same query that is supposed to reduce the number of documents to be searched have no effect when wildcards are specified.
@Elle_Shwer any thoughts on why the wildcard (contains) search would be slow on filtered datasets? I have tested it with a dataset with only 8 documents with the query returning in 7+ seconds.
In general, it is well known that wildcards are slow and computationally expensive. Especially if you are doing both a wildcard query and a wildcard path. That is generally why we recommend users to use autocomplete if they can afford to do so.
Without seeing your exact query, it’s hard to know why it took so long. But if you’re doing a contains of very few characters over very large documents, I am not surprised at all.
The documents are not large. I have been able to get around this for now by limiting the fields for the searches by specifying them explicitly in the regex operator.
While this is not optimal, I think that I might be able to get away with for the time being.