I’ve a two collections one is about contact and another one is about messages…
Contact and Messages are One to many relationship
I need to do a keyword search which need to search for both contact.name & messages.text…
With 2M records on messages and close to 200k records on contacts the search takes like 6 secs without Atlas Search
Tested it with Atlas Search (Autocomplete) with nGram and it’s blazingly fast but struck with the need to search two indexes (because of two collections) and have to do union of those results…
Above method is not efficient as if I need to run two aggregate pipeline query and merge the results, affects pagination and have to write complex code to fix the pagination…
In entirety it’s not efficient.
What’s the best strategy to handle use cases like these…
I’m open to changing schema for those documents as well…
I went on to test out embeddings vs references but seems like there is a limit on embeddings, we receive 1000s of messages for a single contact.
What do you all think about making it efficient and get the search results faster?
It looks like your issue is handling 2 indexes by doing 2 aggregations on 2 different collections and then ordering the results.
To me, the obvious is to put both in the same collections. Have the searchable content in 1 collection with reference back to the original. 1 collection, 1 index and 1 aggregation.
It is not clear however how pagination work on 2 different collections. Do you present result from Contact first or from Messages first. The use-case is not really clear about the ordering.
And why search on Contact, I would imagine a Message contains the Contact involved. If it does with an _id reference, the add the name or use the name only in Message, this way you only search Message. It makes changing the name more challenging, but it is certainly a less frequent use case than suffer from performance if the more frequent UC is much faster.