Atlas search index fields nested in fields with unknown name

Hi, how do I index the “code” and “pop” fields in the documents like the following in Atlas? Note that the field under the “cities” is dynamic and can be any city in the world. Thanks.

{
  "cities":{
    "toronto":{
      "code": "TO",
      "pop": 10000
    },
    "newyork":{
      "code": "NY",
      "pop": 234000
    },
    ......
  }
}

Hi @Andrew_Wang3 ,

The Atlas search UI and API allow you to use a “dynamic” mapping when the index is created.

So you can map dynamically anything below “cities”…

Now the problem is that the names you mentioned are field names and not values. This is not being indexed for text searching, perhaps you should use a model like this:

{
"cities" : [
              { cityName : "toronto", code : "TO" , "pop" : 10000 },
             { cityName : "newyork", code : "NY" , "pop" : 234000} ...
]}

Thanks
Pavel

Thanks, @Pavel_Duchovny.

We cannot change the model as our customers have been using the model for quite a long time. The problem with using dynamic mapping is that we have a lot of fields under each city and we only want to index a few of them. Also, there are fields under each city that are nested documents and we want to index some of the fields in those nested documents as well.

Do you see a way to resolve it?

Thanks.

@Andrew_Wang3 ,

Can you share some examples of what kind of searches you try to achieve?

Ty
Pavel

Thanks, @Pavel_Duchovny.

I tried to simplify the document so that it is easier to communicate; however, it seems that it actually led to more confusion.

The example at the bottom is a more realistic snippet of our documents. The fields like “CUSTOMER_ID” and “CONTACT_ID” in the example are dynamically extracted from the customer’s data, which is unknown to us. If I search for documents that meet the following criteria, I would expect the sample document to return.

  term_display_name: customer_id
  name (of data class): Customer Number
  column_type: string

The sample document is also expected to return with the following criteria:

  term_display_name: Contact Identifier

Thanks for looking into it!


Example Document:

{
  "entity":{
    "column_info": {
      "CUSTOMER_ID": {
        "column_terms": [
          {
            "term_id": "436fee03-ba36-4627-b5af-96af19cddce0",
            "term_display_name": "customer_id",
            "confidence": 1,
            "specification": "NAME_MATCHING"
          }
        ],
        "data_class": {
          "selected_data_class": {
            "name": "Customer Number",
            "id": "436fee03-ba36-4627-b5af-96af19cddce0",
            "setByUser": false
          }
        },
        "type": "string",
        "rejected_terms": [],
        ...
      },
      "CONTACT_ID": {
        "column_terms": [
          {
            "term_id": "436fee03-ba36-4627-b5af-96af19cddce0",
            "term_display_name": "Contact Identifier",
            "confidence": 1,
            "specification": "ML based term assignment"
          }
        ]
      }
    }
  }
}

Hi @Andrew_Wang3 ,

You can use a wild card path to force that each of the terms will be looked at a field with a specific name using the following “regex” as the path"*.<FIELD_NAME>":

[{
 $search: {
  compound: {
   must: [
    {
     text: {
      query: 'string',
      path: {
       wildcard: '*.type'
      }
     }
    },
    {
     text: {
      query: 'Customer Number',
      path: {
       wildcard: '*.name'
      }
     }
    },
    {
     text: {
      query: 'customer_id',
      path: {
       wildcard: '*.term_display_name'
      }
     }
    }
   ]
  }
 }
}]

See how the query using compound to form 3 different compound conditions and using *.type , *.name and *. term_display_name to only hit specific nested fields.

In this case my search index was completely dynamic mapping. But if you know that only specific field paths are dynamically queried then only map those with dynamic toggle/flag.

hope that helps…

Ty
Pavel

I’m having similar problem. Is it possible to define index using wildcard in the middle of the path, like this…?

{
  "fields": {
    "product.$**.name": {
      "maxGrams": 9,
      "minGrams": 1,
      "type": "autocomplete"
    }
  }
}