Multiple Vector Embeddings in one document?

Hi All,

MongoDB atlas vector search seems to be a game changer and would greatly improve the ease of working with data on my LLM application.

However I have noticed in the documentation, it seems that only one vector embeddings can be stored in one document?

e.g. the vector embeddings for the “description” of the cheese document, what if i also have “reviews”, “recipes” fields for the cheese document, can I also embed these in the same document?

https://www.mongodb.com/docs/atlas/atlas-search/knn-beta/#std-label-knn-beta-egs

{
    _id: 3,
    name: 'feta',
    description: "Greek brined white cheese made from sheep's milk or from a mixture of sheep and goat's milk.",
    countryOfOrigin: 'Greece',
    egVector: [
      -0.015739429742097855,
      0.04937680810689926,
      -0.1067470908164978,
      0.1293928325176239,
      -0.03162907809019089
    ],
    aging: 'about 3 months',
    yearProduced: 2021,
    brined: true
  },

Thx in advance!

Hi @Keith_Kwan and welcome to MongoDB community forums!!

The documentation for the vector search have been crafted to explain the concepts in more simpler way. The documentation explain the vector embedding for the single field but the vector embedding can be added for multiple fields as well.

I tried to update the sample data in the Atlas using some additional fields like:

Atlas atlas-b8d6l3-shard-0 [primary] vertorSearch> db.textSearch.find()
[
  {
    _id: 1,
    name: 'mozzarella',
    description: "Italian cheese typically made from buffalo's milk.",
    countryOfOrigin: 'Italy',
    egVector: [
      -0.09950768947601318,
      -0.02402166835963726,
      -0.046839360147714615,
      0.06274440884590149,
      -0.0920015424489975
    ],
    aging: 'none',
    yearProduced: 2022,
    brined: false,
    reviews: 'The recipie turned out to be really nice.',
    egVector2: [
      -0.11875683814287186,
      0.027652710676193237,
      -0.0073554981499910355,
      0.030328862369060516,
      -0.04793226718902588
    ]
  },
  {
    _id: 2,
    name: 'parmesan',
    description: "Italian hard, granular cheese produced from cow's milk.",
    countryOfOrigin: 'Italy',
    egVector: [
      -0.04228218272328377,
      -0.024080513045191765,
      -0.029374264180660248,
      -0.04369240626692772,
      -0.01295427419245243
    ],
    aging: 'at least 1 year',
    yearProduced: 2021,
    brined: false,
    reviews: 'This cheeze is good for making pizza',
    egVactor2: [
      -0.029639432206749916,
      0.0437360517680645,
      0.0022121944930404425,
      0.018038751557469368,
      -0.16932083666324615
    ]
  },
  {
    _id: 3,
    name: 'feta',
    description: "Greek brined white cheese made from sheep's milk or from a mixture of sheep and goat's milk.",
    countryOfOrigin: 'Greece',
    egVector: [
      -0.015739429742097855,
      0.04937680810689926,
      -0.1067470908164978,
      0.1293928325176239,
      -0.03162907809019089
    ],
    aging: 'about 3 months',
    yearProduced: 2021,
    brined: true,
    egVector2: [
      0.04778356850147247,
      -0.027836525812745094,
      0.013962717726826668,
      0.0071579585783183575,
      -0.05239229276776314
    ],
    reviews: 'This is good for salads and authentic Italian pizza'
  }
]

and added the following index search for the same.


{
  "mappings": {
    "dynamic": true,
    "fields": {
      "egVector": {
        "dimensions": 5,
        "similarity": "euclidean",
        "type": "knnVector"
      },
      "egVector2": {
        "dimensions": 5,
        "similarity": "euclidean",
        "type": "knnVector"
      }
    }
  }
}

finally I tried the query for the egVector2 field values like:

[
  {
    '$search': {
      'index': 'default', 
      'knnBeta': {
        'vector': [
          -0.015739429742097855, 0.04937680810689926, -0.1067470908164978, 0.1293928325176239, -0.03162907809019089
        ], 
        'path': 'egVector2', 
        'k': 5
      }
    }
  }
]

and I was able to get two documents that matched the vector values.

Please note that the vector in the search query has been randomly generated to retrieve the output from the $search operation.

Vector embeddings are added to the text fields on which the $search operation is applied. It’s important to note that having multiple vector embeddings wouldn’t be meaningful; rather, it would increase the document size.

Since the $search operation can only be utilised in the first stage of the aggregation pipeline, it is recommended to use vector embeddings for a single field. However, to better understand your specific requirements and use case, could you help me understand the requirement to use multiple vector embedding?

Please reach out if you have any further questions.

Regards
Aasawari

1 Like

Hi,
I have my schema like

{
groupName: “name1”,
messages:[
{“messageId”:101,
“messageText”:“hi there”
“messageEmbedded”:[embedded message data]
},
{“messageId”:102,
“messageText”:“hello”
“messageEmbedded”:[embedded message data]
}
],
}

How can I create search index for my vector embedded message data?

correct this search Index

{
“mappings”: {
“dynamic”: true,
“fields”: {
“messages.messageEmbedded”: {
“dimensions”: 1536,
“similarity”: “cosine”,
“type”: “knnVector”
}
}
}
}

Hi @Koushik_Sherugar and welcome to MongoDB community forums!!

For better visibility within the community, we encourage creating a fresh topic with all the details and appropriate tags and then posting on the community forum.

In general it is preferable to start a new discussion to keep the details of different environments/questions separate and improve visibility of new discussions. That will also allow you to mark your topic as “Solved” when you resolve any outstanding questions.

Regards
Aasawari

hi Koushik, did you find a solution for your problem? I have the same problem currently

Hi Aasawari. It would be awesome to get a response to the second question provided here. Im searching for a solution for days!

Hi @Patrick_Treppmann and welcome to MongoDB community forums!!

For better visibility within the forum, we encourage creating a fresh topic with all the details and appropriate tags and then posting on the community forum.
In the new topic, could you share your complete requirement with supported sample document and the index definition?

Regards
Aasawari