Since you are using the edgeGram
tokenization strategy, Atlas Search creates tokens from your documents from left-to-right, with a minimum of 2 characters and a maximum of 7 characters.
For “Lionel Messi”, the token outputs would be: [li, lio, lion, lione, lionel, lionel[SPACE]]
. Since the search term “lio” matches one of the token outputs, the document with author = Lionel Messi
is returned.
Similarly, “Delio Valdez” will be tokenized from left-to-right to generate the following output tokens: [de, del, deli, delio, delio[SPACE] , delio V]
. Since the search term “lio” does not match any of the output tokens, the document with author = Delio
Valdez is not returned.
To achieve the experience you are describing, you can use the nGram
tokenization strategy, which would create the following tokens for “Delio Valdez”: [de, del, deli, delio, delio[SPACE] , delio V, el, eli, elio, elio[SPACE], elio V, elio Va, li, lio, lio[SPACE], lio V, lio Va, lio Val, io, io[SPACE], ..., va, val, vald, valde, valez, al, ald, ...., ld, lde, ..., de, dez, ez]
. As you can see, a search for “lio” would match the “lio” token generated by Atlas Search for this document and it would be returned in the query results.
It should be noted using the nGram
tokenization strategy significantly increases the number of tokens generated and stored in your Atlas Search index, subsequently increasing the size of your search index.