Hi @Keith_Kwan and welcome to MongoDB community forums!!
The documentation for the vector search have been crafted to explain the concepts in more simpler way. The documentation explain the vector embedding for the single field but the vector embedding can be added for multiple fields as well.
I tried to update the sample data in the Atlas using some additional fields like:
Atlas atlas-b8d6l3-shard-0 [primary] vertorSearch> db.textSearch.find()
[
{
_id: 1,
name: 'mozzarella',
description: "Italian cheese typically made from buffalo's milk.",
countryOfOrigin: 'Italy',
egVector: [
-0.09950768947601318,
-0.02402166835963726,
-0.046839360147714615,
0.06274440884590149,
-0.0920015424489975
],
aging: 'none',
yearProduced: 2022,
brined: false,
reviews: 'The recipie turned out to be really nice.',
egVector2: [
-0.11875683814287186,
0.027652710676193237,
-0.0073554981499910355,
0.030328862369060516,
-0.04793226718902588
]
},
{
_id: 2,
name: 'parmesan',
description: "Italian hard, granular cheese produced from cow's milk.",
countryOfOrigin: 'Italy',
egVector: [
-0.04228218272328377,
-0.024080513045191765,
-0.029374264180660248,
-0.04369240626692772,
-0.01295427419245243
],
aging: 'at least 1 year',
yearProduced: 2021,
brined: false,
reviews: 'This cheeze is good for making pizza',
egVactor2: [
-0.029639432206749916,
0.0437360517680645,
0.0022121944930404425,
0.018038751557469368,
-0.16932083666324615
]
},
{
_id: 3,
name: 'feta',
description: "Greek brined white cheese made from sheep's milk or from a mixture of sheep and goat's milk.",
countryOfOrigin: 'Greece',
egVector: [
-0.015739429742097855,
0.04937680810689926,
-0.1067470908164978,
0.1293928325176239,
-0.03162907809019089
],
aging: 'about 3 months',
yearProduced: 2021,
brined: true,
egVector2: [
0.04778356850147247,
-0.027836525812745094,
0.013962717726826668,
0.0071579585783183575,
-0.05239229276776314
],
reviews: 'This is good for salads and authentic Italian pizza'
}
]
and added the following index search for the same.
{
"mappings": {
"dynamic": true,
"fields": {
"egVector": {
"dimensions": 5,
"similarity": "euclidean",
"type": "knnVector"
},
"egVector2": {
"dimensions": 5,
"similarity": "euclidean",
"type": "knnVector"
}
}
}
}
finally I tried the query for the egVector2 field values like:
[
{
'$search': {
'index': 'default',
'knnBeta': {
'vector': [
-0.015739429742097855, 0.04937680810689926, -0.1067470908164978, 0.1293928325176239, -0.03162907809019089
],
'path': 'egVector2',
'k': 5
}
}
}
]
and I was able to get two documents that matched the vector values.
Please note that the vector in the search query has been randomly generated to retrieve the output from the $search operation.
Vector embeddings are added to the text fields on which the $search operation is applied. It’s important to note that having multiple vector embeddings wouldn’t be meaningful; rather, it would increase the document size.
Since the $search operation can only be utilised in the first stage of the aggregation pipeline, it is recommended to use vector embeddings for a single field. However, to better understand your specific requirements and use case, could you help me understand the requirement to use multiple vector embedding?
Please reach out if you have any further questions.
Regards
Aasawari