Hi All,
I’ve been working with the $vectorSearch
stage in my MongoDB aggregation pipeline, but I’m encountering an issue where the $lookup
stage always returns an empty array when paired directly with $vectorSearch
. If I use $match
or other stages before $lookup
, it works perfectly fine.
The Problem
Here’s the aggregation pipeline I’m running:
const result = await EmbeddingModel.aggregate([
{
'$vectorSearch': {
'index': 'default',
'path': 'embeddings.text',
'queryVector': embedding,
'numCandidates': 10,
'limit': 2
}
},
{
'$lookup': {
from: "images",
localField: "imageId",
foreignField: "id",
as: "variantDetails"
}
},
{
$limit: 1
},
{
$project: {
imageId: 1,
id: 1,
variantDetails: 1
}
}
]);
Expected behavior:
- The
variantDetails
field should be populated with matching documents from theimages
collection based on theimageId
.
Actual behavior:
- The
variantDetails
field is always an empty array.
What I’ve Tried
- Isolating
$lookup
: If I use a$match
stage before$lookup
, it works as expected:
const result = await EmbeddingModel.aggregate([
{
$match: { imageId: "img1" } // Manually matching a known document
},
{
'$lookup': {
from: "images",
localField: "imageId",
foreignField: "id",
as: "variantDetails"
}
}
]);
This populates variantDetails correctly.
- Inspecting $vectorSearch Output: I’ve verified that $vectorSearch returns documents with the imageId field correctly. Here’s an example of what $vectorSearch returns:
[
{ "_id": "1", "imageId": "img1", "embeddings": { "text": [...] } }
]
- Breaking the Pipeline into Steps: Running $vectorSearch and $lookup as separate queries works:
const vectorSearchResults = await EmbeddingModel.aggregate([
{
'$vectorSearch': {
'index': 'default',
'path': 'embeddings.text',
'queryVector': embedding,
'numCandidates': 10,
'limit': 2
}
}
]);
const result = await EmbeddingModel.aggregate([
{
$match: {
_id: { $in: vectorSearchResults.map(doc => doc._id) }
}
},
{
'$lookup': {
from: "images",
localField: "imageId",
foreignField: "id",
as: "variantDetails"
}
}
]);
However, For this specific task I need to keep everything in a single pipeline.
Environment
MongoDB version: 8
Deployment type: Atlas
Operating system: macOS Sequoia
Driver: NodeJS LTS
Questions
Is this expected behavior, or could this be a bug in how $vectorSearch interacts with $lookup?
Are there any known limitations or workarounds for using $lookup after $vectorSearch?
Could this behavior be tied to $vectorSearch adding metadata or altering the document structure?
I appreciate any insights or suggestions you can offer! If more information is needed, I’m happy to provide a minimal reproducible example.
Thank you!