Should I store embeddings in the same document as the data?

As in the title, the embeddings are only used for indexing and are not part of the data as such, so should I store them as part of the document that contains the data, or should I avoid the bloat and have a new collection with just the embeddings and an ID for the document that is being referenced?

We recommend including them in the documents- this way you can use them with other fields in the raw documents to do some pre or post filtering. See for example:

You might experiment with embedding models with a smaller number of dimensions to reduce the size of the data.


I was using OpenAi text-embedding-3-small, which was working fine, taking maybe 2 seconds to return the result. I am using the embeddings within the result for post-processing and displaying debug information. I switched to text-embedding-3-large and my site now takes ~25 seconds to return the result. If I remove the embeddings from the projection, it takes ~3.3 seconds. If I download the associated embeddings myself after the query, then I get get the total time to ~5 seconds.

I’m not sure why including the embeddings in the projection causes such a massive slowdown (possibly server memory and paging?), but that’s what lead to my train of thought, if I’m having to retrieve data after the query anyway, then why not just use the embeddings and _id in a collection and retrieve what I need afterwards.

I could try reducing the embedding size to 1024 from 3072, but then I obviously lose context.

@Will_Calderwood when it comes to performance variances like this, it depends…
Are you using an M0 free cluster and how big is your data?
at 3072, the vector size could be ~12KB which is not a terrible thing, but there might be something pushing performance over a cliff due to access pattern, lack of memory space or something else.

Could you share a typical document and some stats about your average document size and # of documents (this is visible in the Collections tab in Atlas). Sharing the aggregation pipeline would also provide more insights if possible.

I did a bit more digging and created a separate post related to performance here:

My assumption is that it’s to do with memory limits and paging. It is indeed a M0 cluster.

1 Like