I am configuring search index for one of the projects, have been creating custom analysers with different combination of Character filter, Tokenizer and Token filters.
I’ve a specific case which seems to be almost impossible to achieve keeping $search stage as first stage (we have not other option at the moment because Atlas doesn’t support any other stages before the search stage I believe), aggregation level with available set of Tokenizer and token filters on Atlas.
Case:
Stored data:
There are strings stored as “xxxx xxxx 1.5 mL xxxxxxx xxxxxxx”, “xxxx xxxx 1.0 cc xxxxxxx xxxxxxx”.
User input:
“1.5mL”, “1.0cc”, “1cc” (notice the input is without space vs, what’s stored in the data)
Things tried:
- Wordgraphdelimeter
Wordgraphdelimeter to split word on case change, number, and concatenate on word, numbers and all
This gives results but results with 15ml coming before 1.5 ml
- Regexsplit
Split on alphanumeric boundaries
Not giving results at all
- Used custom analyser with Search analyser only, not in index analyser
When I take a look at the explain stage, there are no term or phrase query with [“1.5 ml”]
The only way possible I am able to see right now is to generate combinations at a driver level and pass it to the query.
I want to manage this at an analyser level or at aggregation level, before thinking about generation combinations at a driver level.
Any thoughts / ideas / inputs would be helpful.
I think if we don’t find a way and have to go to driver to manage this case, I think the feedback request for allowing stages before $search
stage would really help here.