How to apply $search pipeline for multiple collections in MongoDB Atlas Search?

I understand $search can be applied to single collection in MongoDB Atlas search. What can be a design approach for performing Atlas search over multiple collections in MongoDB. Should I create a single collection which has all attributes of Search terms? I would end up having a bloated document scenario and large size of index data.

What is the best solution for this?

You are right. creating a single search collection is the best approach for predictable results. While we do plan to introduce searching across multiple collections, the recommended approach would still be to aggregate all documents in a sing me search collection. Relevance scores are calculated based on documents relationship to the entire corpus.

You can use Realm Triggers and $merge to create a materialized view or you can create your own process for moving data to a single collection.

Thanks for the quick response Marcus.

I also felt the design in similar line.

If I create a single collection for search from base collections, I will end up having another collection which may occupy more storage and additional billing.

Alternative is to go with external reference pattern for child document collection (One collection with external reference document (Search Attributes) to base document collection. In this way, I would having similar code logic to Approach 1(Single collection for Search) for keeping external reference document up to date with base collection document. But this will not introduce additional collection for Search functionality. But there would be impact to CRUD operations as $merge need to be performed for multiple documents(even if one attribute of base document gets changed - in case a base document is referred in 20K Child documents).

What will be the performance impact of REALM trigger and $merge as we will create a trigger for - INSERT, UPDATE and DELETE scenarios?

Depending on above scenarios, what would be your recommendation? We may have 25M records each in both Child document and base document collections.

I have not not tested this to a point where I had any challenges with performance. If you push the limits, please let me know. We do have customers doing so that have not complained but I don’t know how big your application is today.

I am just reopening this discussion again. As I already mentioned, single collection has been created for search purpose. But I have multiple attributes on that single collection and we have a requirement to search over a single or multiple attributes in the same collection.

We currently have one search index covering for all attributes. What is the best practice for this scenario?

If you are trying to match on multiple attributes, use should: in compound operator

Example : - create autocomplete search index on both fields: title and plot

$search: {
    compound: {
        should: [
            {
                autocomplete: {
                    query:'hammer',
                    path: 'title',
                },
            },
            {
                autocomplete: {
                    query:'hammer',
                    path: 'plot',
                },
            },
        ],
    },
},
1 Like