$lookup Returns Empty Array When Used After $vectorSearch in Aggregation Pipeline

Hi All,

I’ve been working with the $vectorSearch stage in my MongoDB aggregation pipeline, but I’m encountering an issue where the $lookup stage always returns an empty array when paired directly with $vectorSearch. If I use $match or other stages before $lookup, it works perfectly fine.

The Problem
Here’s the aggregation pipeline I’m running:

const result = await EmbeddingModel.aggregate([
    {
        '$vectorSearch': {
            'index': 'default',
            'path': 'embeddings.text',
            'queryVector': embedding,
            'numCandidates': 10,
            'limit': 2
        }
    },
    {
        '$lookup': {
            from: "images",
            localField: "imageId",
            foreignField: "id",
            as: "variantDetails"
        }
    },
    {
        $limit: 1
    },
    {
        $project: {
            imageId: 1,
            id: 1,
            variantDetails: 1
        }
    }
]);

Expected behavior:

  • The variantDetails field should be populated with matching documents from the images collection based on the imageId.

Actual behavior:

  • The variantDetails field is always an empty array.

What I’ve Tried

  1. Isolating $lookup: If I use a $match stage before $lookup, it works as expected:
const result = await EmbeddingModel.aggregate([
    {
        $match: { imageId: "img1" } // Manually matching a known document
    },
    {
        '$lookup': {
            from: "images",
            localField: "imageId",
            foreignField: "id",
            as: "variantDetails"
        }
    }
]);

This populates variantDetails correctly.

  1. Inspecting $vectorSearch Output: I’ve verified that $vectorSearch returns documents with the imageId field correctly. Here’s an example of what $vectorSearch returns:
[
    { "_id": "1", "imageId": "img1", "embeddings": { "text": [...] } }
]
  1. Breaking the Pipeline into Steps: Running $vectorSearch and $lookup as separate queries works:
const vectorSearchResults = await EmbeddingModel.aggregate([
    {
        '$vectorSearch': {
            'index': 'default',
            'path': 'embeddings.text',
            'queryVector': embedding,
            'numCandidates': 10,
            'limit': 2
        }
    }
]);

const result = await EmbeddingModel.aggregate([
    {
        $match: {
            _id: { $in: vectorSearchResults.map(doc => doc._id) }
        }
    },
    {
        '$lookup': {
            from: "images",
            localField: "imageId",
            foreignField: "id",
            as: "variantDetails"
        }
    }
]);

However, For this specific task I need to keep everything in a single pipeline.

Environment
MongoDB version: 8
Deployment type: Atlas
Operating system: macOS Sequoia
Driver: NodeJS LTS

Questions

Is this expected behavior, or could this be a bug in how $vectorSearch interacts with $lookup?

Are there any known limitations or workarounds for using $lookup after $vectorSearch?

Could this behavior be tied to $vectorSearch adding metadata or altering the document structure?


I appreciate any insights or suggestions you can offer! If more information is needed, I’m happy to provide a minimal reproducible example.

Thank you!

Hi @Ryan_Fotovat! Thank you for posting to the forums.

Is this expected behavior, or could this be a bug in how $vectorSearch interacts with $lookup?

I see that you have tried manually passing the results set from $vectorSearch into a $lookup pipeline. Could you possibly also try removing the $project to further isolate the issue? There is nothing specific to $vectorSearch that is coupled with $lookup behavior, it should behave as a subsequent pipeline just as it would with $match as you’ve suggested and tested.

We have tested this in end-to-end tests and never seen any issues with interactions here, including on sharded clusters.

Another question I had was why you chose to model the images in a separate collection? I think your performance, in particular if you use binData vectors and a $project in production, should be superior with colocated embeddings and images without needing to have a $lookup.

Hi @Henry_Weller,

I revisited the old code today, and it seems to be working now, which is odd.
I don’t recall making any changes to my data since two days ago. Could it be that MongoDB hadn’t indexed some of the data yet, as it was recently added at the time?

I initially chose to split up the collections, thinking it might help with organization and data modification since they were commonly updated fields. However, this approach ended up complicating the query process. As a result, I decided to combine the previous collections into just “products” and “variants” collections. This setup seems to work well when combined with $vectorSearch.filter.

Thanks!

Glad to hear it’s working! I’m also glad you chose to comingle the data in a single collection- that is our recommended approach and follows the MongoDB saying “data that is queried together should live together.”

Do let me know if you have any issues in the future, and good luck with your app!