Search index filtering

Hi, we have collection with documents that looks like this:

{
  _id: ObjectId("123"),
 companyId: "456-678",
 description: "this is a description"
}

We want to let companies that uses our system to search on the description field so we created a full-text-search on that field.
How can we search only documents that have specific companyId rather than searching on all the collection?
$search must to be the first step in the aggregation pipeline so we cant $match before that.

Is this filter will use the existing companyId index?

but $search must to be the first in the pipeline, no? because im getting when doing otherwise

ERROR	MongoServerError: $_internalSearchMongotRemote is only valid as the first stage in a pipeline

Hello,

You can use a compound query using a combination of “filter” and “must” to obtain the functionality of $match, this is explained in detail here.

I did this example below with two records having same description and different companyId.

Atlas atlas-whhaeg-shard-0 [primary] test> db.companies.find()
[
  {
    _id: ObjectId("63593dc870ca422a31030ded"),
    companyId: '456-678',
    description: 'this is a description'
  },
  {
    _id: ObjectId("635945a370ca422a31030dee"),
    companyId: '45232-678676',
    description: 'this is a description'
  }
]

Using the below $search specifying the “must” clause to match companyId with “456-678” and having a filter on description with part of the description, in this case the word “this”.

Atlas atlas-whhaeg-shard-0 [primary] test> db.companies.aggregate([
...   {
...     "$search": {
...     index: 'companyId',
...       "compound": {
...         "filter": [{
...           "text": {
...             "query": ["this"],
...             "path": "description"
...           }
...         }],
...         "must": [{
...           "text": {
...             "query": "456-678",
...             "path": "companyId"
...           }
...         }]
...       }
...     }
...   }
... ])
[
  {
    _id: ObjectId("63593dc870ca422a31030ded"),
    companyId: '456-678',
    description: 'this is a description'
  }
]

As you can see the “must” clause forced the search to return only documents that matched the specified companyId.

There are various other examples you can try using options of the “compound” section of $search in the documentation link below:

I hope you find this helpful.

Regards,
Elshafey

1 Like

@Benny_Kachanovsky1 - I wanted to further expand on the previous reply. Using compound and filter are the right way to go about this, but let’s refine that a bit. First, besides setting up your Atlas Search index definition as you have already, configure companyId to use the keyword analyzer (so it becomes a single filterable string term in the index). Here’s my index definition I used:

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "companyId": [
        {
          "dynamic": true,
          "type": "document"
        },
        {
          "analyzer": "lucene.keyword",
          "searchAnalyzer": "lucene.keyword",
          "type": "string"
        }
      ]
    }
  }
}

Once you have that index configuration saved, and your data added and indexed, this is a query that will filter by companyId and use the query term “description” (sorry, field name and query string are same in this example we are using) for relevancy ranking within all documents from the specified companyId:

[
  {
    '$search': {
      'index': 'default', 
      'compound': {
        'must': {
          'text': {
            'path': 'description', 
            'query': 'description'
          }
        }, 
        'filter': {
          'text': {
            'path': 'companyId', 
            'query': '123-456'
          }
        }
      }
    }
  }, {
    '$project': {
      'companyId': 1, 
      'description': 1, 
      'score': {
        '$meta': 'searchScore'
      }
    }
  }
]

Be sure to use filter for companyId so it does not affect relevancy scoring and order.

2 Likes

@Mark_Solovskiy could you elaborate?

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.