How to perform an $in query in a integer field inside a $search stage

Hello

I have a collection named Projects, with an integer field Details.Situation. I was performing queries using the find operator, but, since we needed to improve our search mechanisms, we’re trying to migrate to Atlas Search instead.
One of the remaining problems, which I didn’t figure out how to solve it, is how to perform the query “Details.Situation”: { “$in”: [ 0, 1, 3, 5, 6] } in the $search pipeline.

I’ve already tried to create the index as number, and string data type (with lucene.keyword analyzers)

Does anyone have the idea how to perform this query?

About converting this field, this collection contains 25M objects, so, I’d prefer to avoid it…

Thanks for your attention
Jeferson Luis Soares

Hi @Jeferson_Soares thanks for the question. Depending on the details, it could be your lucky day. To help, we may need information like your index definition and a sample document. I will do my best here to answer with limited back-and-forth.

Situation A: Details.Situation is a single numeric value, but you want to check to see if any of the integers exists as a value for Details.Situation. If that’s the case, you can use compound.(filter|should).range to match on these criteria.

Situation B: Details.Situation is a multi-value numeric field. In other words, it’s an array/list of numeric values and you want to match if one number in the query appears in documents.

We will begin rolling out the ability to index numbers in arrays this week or next at the latest (barring disaster). If you could send an email to atlas-search-feedback@mongodb.com, we will add you to the list of the first customers to get access to the capability.

Then, for your reference, you will want to use (pseudo-code) $search.compound.filter.range(options, path: "Details.Situation") to get exactly what you are looking for in terms of $in functionality.

I am curious about why casting to a string did not work as well. In any event, an index definition and sample doc could be helpful.

Here’s the docs for this issue: https://www.mongodb.com/docs/atlas/atlas-search/compound/#mongodb-data-filter

2 Likes

Hi @Marcus

First of all, thanks for your answer. it came sooner than expected!

Related to my question, the scenario is the one you described on Situation A, and here comes a sample document:

{
    "_id" : NUUID("0a2ac404-f46b-4111-a857-0106a4791233"),
    "Details" : {
        "Customer" : {
            "Name" : "Fake Customer",
            "Id" : NUUID("98561cfc-7096-48c7-aecf-7462630c007d")
        },
        "ModifiedOn" : ISODate("2020-04-14T13:27:59.738Z"),
        "Name" : "Kitchen design",
        "Responsible" : {
            "Name" : "Fake Responsible",
            "Id" : "123456789"
        },
        "Situation" : 0,
        "Visibility" : 1
    },
    "AccountId" : "0001",
}

I didn’t get how to use the range filter, since the possible operators are lt, lte, gt and gte, and we do not have the selected values in a range - sometimes some values are skipped.

Here is a sample of the search index:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "AccountId": {
        "analyzer": "lucene.keyword",
        "searchAnalyzer": "lucene.keyword",
        "type": "string"
      },
      "Details": {
        "fields": {
          "ModifiedOn": {
            "type": "date"
          },
          "Name": {
            "maxGrams": 10,
            "minGrams": 3,
            "type": "autocomplete"
          },
          "Situation": {
            "analyzer": "lucene.keyword",
            "searchAnalyzer": "lucene.keyword",
            "type": "string"
          },
          "Visibility": {
            "representation": "int64",
            "type": "number"
          }
        },
        "type": "document"
      },
    }
  }
}

I’ve already used the definition of the field Details.Situation as defined to the Details.Visibility field.

Ahh, that makes sense. The API could improve there, for sure. It will soon.

The easiest way to do equality today with range is to have a combination of lte and gte for a given value. For example, if you want to match 3, consider adding gte:3 and lte:3 as parameters to your range query. In this example, the only possible number is 3.

Here’s some code for illustrative purposes in JavaScript of how it would behave under the hood so you can test it out if you’d like:

/* never use eval in JS did they remove that yet? */
function evaluate(input){
    if( input <= 3 && input >=3){
        return true;
    } else {
        return false;
    }
}

evaluate(3)
// would return true
evaluate(1)
// would return false
evaluate(5)
// would return 

for an example from our docs and API consider:

{
    "$search": {
       "index": "faking_it", 
       "range": {
          "path": "Details.Situation",
          "gte": 3,
          "lte": 3
       }
    }
}

Let me know if this helps.

Hey @Marcus

I’ve written the query this way:

    "should": [
          { "range": {"path": "Details.Situation", "gte": 1, "lte": 1 } },
          { "range": {"path": "Details.Situation", "gte": 3, "lte": 3 } },
          { "range": {"path": "Details.Situation", "gte": 5, "lte": 5 } },
          { "range": {"path": "Details.Situation", "gte": 6, "lte": 6 } }
     ],
     "minimumShouldMatch": 1

It worked, but I was looking for something more elegant.

Is this the current way of doing this query? Or the way I’d written it could be improved?

Please let me know if there are plans of developing a better way.

Thanks again

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.