Aggregation Pipeline with Multi-Field Search (embeddedDocuments and knnVector)

Lazar_Nakinov · July 23, 2023, 5:47pm

Hello everyone,

I have a question regarding the creation of an aggregation pipeline that involves searching on two different field types, namely “embeddedDocuments” and “knnVector,” in two separate stages. Specifically, I want to first filter out all documents for a particular “company_id” and “role_id,” and then perform vector search on the resulting dataset.

I’ve attempted various approaches using functions such as $match, $search, compound, and $facet, but I’ve encountered several errors along the way. Here are some examples of the errors I faced:

MongoServerError: “compound.should[0].knnBeta” - knnBeta cannot be nested.
MongoServerError: $_internalSearchMongotRemote cannot be used within a $facet stage.
MongoServerError: “compound.must[1].search.query” must be a string.
MongoServerError: “compound.must[1].search.path” is required.

Below is a sample document to provide context for the data structure:

{
    "_id": ObjectId("64974bfc24cb7cd87c4e0359"),
    "text": "some text",
    "company_id": "abcdef-e799-4be3-94db-9f79b38fdeff",
    "role_id": "it",
    "text_embedding": [
        -0.0058572087, 0.033706117, -0.0049423487, -0.033509824,
        ... 1532 more items
    ]
}

Additionally, the search index mappings are as follows:

{
    "mappings": {
        "dynamic": true,
        "fields": {
            "text_embedding": [
                {
                    "dimensions": 1536,
                    "similarity": "cosine",
                    "type": "knnVector"
                }
            ],
            "company_id": {
                "type": "embeddedDocuments"
            },
            "role_id": {
                "type": "embeddedDocuments"
            }
        }
    }
}

My main question is whether this functionality is supported in MongoDB Atlas?

Thank you for your assistance and guidance.

Best regards,
Lazar

Felix_Rejmer · July 25, 2023, 2:04pm

Something like this maybe?

[
  {
    '$search': {
      'index': 'someIndex', 
      'knnBeta': {
        'vector': [
          0.3, -0.4, 0.2, 1
        ], 
        'path': 'text_embedding', 
        'k': 5
      }
    }
  }, {
    '$match': {
      'company_id': 'abcdef-e799-4be3-94db-9f79b38fdeff'
    }
  }
]

Lazar_Nakinov · July 25, 2023, 4:55pm

Thank you Felix. It is working. My mistake was that I was creating the aggregation pipeline on an index having only the field that held the vector embeddings. That is why all my past attempts were not working. After dropping the index, re-creating it as vector and added the company_id as embeddedDocument, the standard $search and $match worked like a charm.

system · July 30, 2023, 4:56pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.