embeddedDocuments and "embedded" score modifier issues

I’m attempting to use the new embeddedDocuments type (still in preview as of writing this) but I’m running into a few issues. I can’t tell if these are actual issues or user error, but any advice either way would be much appreciated.

For reference, the top-level field from my model that I’m attempting to use here is called leasing and the structure of the leasing field on a typical document is as follows:

"leasing": {
    "canLease": true,
    "defaultPrice": null,
    "regionalPrices": [
        {
            "regionId": 32,
            "amountInCents": 111487,
            "expiresOn": "2022-07-01T00:00:00Z"
        },
        {
            "regionId": 581,
            "amountInCents": 111487,
            "expiresOn": "2022-07-01T00:00:00Z"
        },
        {
            "regionId": 30478,
            "amountInCents": 111487,
            "expiresOn": "2022-07-01T00:00:00Z"
        }
    ]
}

regionalPrices is the list of documents I’d like to use with the embeddedDocuments type.

Also, here is the starting index I’m working with as it relates to the leasing field:


Visual Search Index Editor Issues:

Issue 1: Can’t select the field containing embedded documents

When setting up the leasing.regionalPrices in my index with the embeddedDocuments type, I found that the dropdown doesn’t list that field:

I can at least work around this by using the JSON editor to set the field type to embeddedDocuments, but why doesn’t the field even show up in the visual editor?

Issue 2: Can’t use the Visual Editor after setting a field as embeddedDocuments

After successfully setting my field as embeddedDocuments using the JSON editor, upon loading the Visual Editor to check things out I get the following error message:

Issue 3: Sub-fields of embeddedDocuments become unavailable

After setting up the embeddedDocuments type, the fields within those documents which I previously had indexed (see the screenshot way up at the top) are apparently no longer present in the index, and the newly changed field doesn’t show any type:

Screen Shot 2022-06-09 at 1.05.56 PM

Bizarrely, the JSON editor still shows the sub fields:

"leasing": {
  "fields": {
    "canLease": {
      "type": "boolean"
    },
    "defaultPrice": {
      "type": "number"
    },
    "regionalPrices": {
      "fields": {
        "amountInCents": {
          "indexDoubles": false,
          "representation": "int64",
          "type": "number"
        },
        "regionId": {
          "indexDoubles": false,
          "representation": "int64",
          "type": "number"
        }
      },
      "type": "embeddedDocuments"
    }
  },
  "type": "document"
},

Aggregation Issues:

Issue 4: The embedded score modifier on embedded sub-fields

After setting up the index, I can query leasing.regionalPrices using the embeddedDocument query, but when I use the embedded score modifier on a sub-field of the embedded documents:

{
  "embeddedDocument": {
    "path": "leasing.regionalPrices",
    "operator": {
      "range": {
        "gte": 31230,
        "lte": 31230,
        "path": "leasing.regionalPrices.regionId",
      }
    },
    "score": {
      "embedded": {
        "outerScore": {
          "boost": {
            "path": "leasing.regionalPrices.amountInCents"
          }
        }
      }
    }
  }
}

I get the following error:

As shown in the JSON index from issue 3, amountInCents should be indexed as numeric, but apparently the pipeline doesn’t agree.

Hi Lucas_Burns,

Thanks for your interest in the embeddedDocuments index type and embeddedDocument operator! We are excited to see it being used, and appreciate you taking the time to write up your post in such a detailed way.

Visual Index Builder Issues (1, 2, and 3)

With respect to issues related to the Visual Index Builder - we do intend to support the embeddedDocuments field type in the visual index builder, but have not implemented support for the new embeddedDocuments field there yet. We do note this in the docs - though it is easy to miss (emphasis is mine):

Use the embeddedDocuments type to index fields in documents that are elements of an array. Atlas Search indexes embedded documents independent of their parent document. Each indexed document contains only fields that are part of the embedded document array element. You can’t use embeddedDocuments for date or numeric faceting. You can’t use the Atlas UI Visual Index Builder to define fields of embeddedDocuments type.

We will change that note to be more prominent, sorry for the confusion.

Aggregation (Issue 4)

After setting up the index, I can query leasing.regionalPrices using the embeddedDocument query, but when I use the embedded score modifier on a sub-field of the embedded documents:

{
  "embeddedDocument": {
    "path": "leasing.regionalPrices",
    "operator": {
      "range": {
        "gte": 31230,
        "lte": 31230,
        "path": "leasing.regionalPrices.regionId",
      }
    },
    "score": {
      "embedded": {
        "outerScore": {
          "boost": {
            "path": "leasing.regionalPrices.amountInCents"
          }
        }
      }
    }
  }
}

I think the query you might want to run instead is

{
  "embeddedDocument": {
    "path": "leasing.regionalPrices",
    "operator": {
      "range": {
        "gte": 31230,
        "lte": 31230,
        "path": "leasing.regionalPrices.regionId",
        "score": {
          "boost": {
            "path": "leasing.regionalPrices.amountInCents"
          }
        }
      }
    }
  }
}

Details around embeddedDocument operator execution are helpful in better understanding how the score of an embeddedDocument operator is computed, and what fields are available to a function score in different scopes of an embedded document query.

We can think of an embeddedDocument operator as being executed in three stages:

  1. Each document in the embedded document array is evaluated independently of other embedded documents.
    • Matching documents are constrained and scored by only the predicates specified by the operator field of an embeddedDocument operator.
    • Score modifications are applied independently to each embedded document.
    • At this stage, function scores may reference values that are part of embedded documents, like leasing.regionalPrices.amountInCents.
  2. Scores of multiple matching embedded documents are combined in a configurable way.
    • The embedded score modifier option can configure how scores from multiple matching embedded documents are aggregated.
    • (e.g. summing score contributions from matching embedded documents, using the maximum score from matching embedded documents, etc.)
  3. If embeddedDocument is specified as one of several clauses (e.g. as part of a compound operator)
    • Constrain and score documents with other predicates, as in any other query.
    • Add the score contribution from embeddedDocument to score contributions of other query predicates.
    • At this stage, function scores may reference values that are part of the parent of embedded documents, like leasing.defaultPrice.

Said differently, embeddedDocument computes the score of each matching embedded document, combines those scores in a configurable way, and adds that combined score to the net relevance score of result documents.

Does it make sense why the only place where embeddedDocument can use values of embedded documents in a function score is in the first “stage” of execution (inside the operator specified by embeddedDocument), before scores of multiple matching embedded documents are combined?

Please let me know if that is helpful/resolves your issue, and please let me know if I can clarify anything/help explain anything in more detail!

Thanks for the reply, Evan!

To your point with #1, 2, and 3:

As you noted, the documentation says

“You can’t use the Atlas UI Visual Index Builder to define fields of embeddedDocuments type”

(my bad for not reading thoroughly enough) but it you indicated that there are plans to support this. Since the documentation also specifies

You can’t use embeddedDocuments for date or numeric faceting"

it feels like I should ask, is there also a plan to support date or numeric faceting for embeddedDocuments eventually?

Also, I assume that the missing fields I noted in #3 are due to the missing support for the visual editor and not due to the fields being passed over by the search index system?

To #4, I see what you’re saying about the internal operator being the correct place to score the individual sub-documents. That boost → path adjustment also seems to do the trick if I want to boost based on some component of the matching sub-documents.


Is there currently a timeline for getting the embeddedDocuments system out of preview and into a fully supported feature?

Thanks for the reply, Evan!

Sure, my pleasure!

the documentation also specifies

You can’t use embeddedDocuments for date or numeric faceting"

Good question! There is not a plan for that support right now.

Would that be something that you’d be interested in? I’d love to learn more about your use case - if it is something you’d be interested in, would you mind sharing a little more about the shape of the documents that you’d like to use numeric/date faceting over embedded documents on?

Also, I assume that the missing fields I noted in #3 are due to the missing support for the visual editor and not due to the fields being passed over by the search index system?

Yes, exactly - our apologies for the confusion. Of course, we aim to have both the visual index builder and the JSON index representation show the same accurate state for an index definition - but sometimes when adding new features, visual index builder support lags behind support in the JSON editor by some period of time. If there is ever a question about how an index is configured, the JSON view should be considered the “source of truth” for an index.

To #4, I see what you’re saying about the internal operator being the correct place to score the individual sub-documents. That boost → path adjustment also seems to do the trick if I want to boost based on some component of the matching sub-documents.

Great, glad this makes sense and is working for you!

Is there currently a timeline for getting the embeddedDocuments system out of preview and into a fully supported feature?

We are working to move embeddedDocuments out of preview - but we don’t have a timeline for moving this out of preview yet.

We do intend to move it out of preview eventually - and really do appreciate your time in writing this post, and your interest in the feature!

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.