Since you are using the
edgeGram tokenization strategy, Atlas Search creates tokens from your documents from left-to-right, with a minimum of 2 characters and a maximum of 7 characters.
For “Lionel Messi”, the token outputs would be:
[li, lio, lion, lione, lionel, lionel[SPACE]]. Since the search term “lio” matches one of the token outputs, the document with
author = Lionel Messi is returned.
Similarly, “Delio Valdez” will be tokenized from left-to-right to generate the following output tokens:
[de, del, deli, delio, delio[SPACE] , delio V]. Since the search term “lio” does not match any of the output tokens, the document with
author = Delio Valdez is not returned.
To achieve the experience you are describing, you can use the
nGram tokenization strategy, which would create the following tokens for “Delio Valdez”:
[de, del, deli, delio, delio[SPACE] , delio V, el, eli, elio, elio[SPACE], elio V, elio Va, li, lio, lio[SPACE], lio V, lio Va, lio Val, io, io[SPACE], ..., va, val, vald, valde, valez, al, ald, ...., ld, lde, ..., de, dez, ez]. As you can see, a search for “lio” would match the “lio” token generated by Atlas Search for this document and it would be returned in the query results.
It should be noted using the
nGram tokenization strategy significantly increases the number of tokens generated and stored in your Atlas Search index, subsequently increasing the size of your search index.