Improving the performance of Aggregate Pipeline query

Vive_Siva · July 31, 2024, 12:11pm

Context:

I’ve a two collections one is about contact and another one is about messages…

Contact and Messages are One to many relationship

I need to do a keyword search which need to search for both contact.name & messages.text…

With 2M records on messages and close to 200k records on contacts the search takes like 6 secs without Atlas Search

Tested it with Atlas Search (Autocomplete) with nGram and it’s blazingly fast but struck with the need to search two indexes (because of two collections) and have to do union of those results…

Above method is not efficient as if I need to run two aggregate pipeline query and merge the results, affects pagination and have to write complex code to fix the pagination…

In entirety it’s not efficient.

What’s the best strategy to handle use cases like these…

I’m open to changing schema for those documents as well…

I went on to test out embeddings vs references but seems like there is a limit on embeddings, we receive 1000s of messages for a single contact.

What do you all think about making it efficient and get the search results faster?

Thank you

Vive_Siva · August 5, 2024, 3:26pm

Just trying to bump this topic to the top

steevej · August 6, 2024, 5:30pm

It looks like your issue is handling 2 indexes by doing 2 aggregations on 2 different collections and then ordering the results.

To me, the obvious is to put both in the same collections. Have the searchable content in 1 collection with reference back to the original. 1 collection, 1 index and 1 aggregation.

It is not clear however how pagination work on 2 different collections. Do you present result from Contact first or from Messages first. The use-case is not really clear about the ordering.

And why search on Contact, I would imagine a Message contains the Contact involved. If it does with an _id reference, the add the name or use the name only in Message, this way you only search Message. It makes changing the name more challenging, but it is certainly a less frequent use case than suffer from performance if the more frequent UC is much faster.

steevej · August 13, 2024, 1:25pm

@Vive_Siva,

The bumping worked. Now could you please followup?

Vive_Siva · August 13, 2024, 1:39pm

@steevej Appreciate you responding to it.

I figured out a solution to create a message snippet which is just one field combining all the messages for that contact into one text field…

So Atlas search index can perform a full text search on one field and it’s all in one collection rather than two different collection…

Thx

steevej · August 13, 2024, 1:41pm

You should mark your last post as the solution then.

Thanks

system · August 18, 2024, 1:42pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.