Does time of creation of atlas search index fo vector embeddings matter?

Hello,

I am wandering if it matters if I create search index for embeddings before the data are inserted or after it (or during upload). I am inserting the data with already computed embeddings.

Thanks a lot!

Hello David, welcome to the MongoDB community!

If you are importing data and thinking about creating indices, I would advise you to create the indices after importing. The reason is that when you insert data into a database, you need to insert it into the indexes as well and this increases the import insertion time. This way your import is faster and then you can create the indexes and let Atlas do the work for you. No import, you need to keep checking to make sure there are no errors or failures. Generally in the teams where I worked, we do it this way.

I’m available.

1 Like

Hi Samuel, thank a lot for the answer!

My question should be probably more specific. I don’t care about time or failures.
Why I was asking is to know if order of index creation and inserting data influences results of similarity search of embeddings.
In other words - if I first insert embeddings into Atlas and then create search index can I get different results of similarity search than if I first created search index and then inserted embeddings?

If it helps my index looks like this:

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "embedding": {
        "dimensions": 1536,
        "similarity": "cosine",
        "type": "knnVector"
      }
    }
  }
}

I understand now David, thanks for explaining!

There is no problem creating the index later. In my RAG application, I did this too. I inserted data to play with and then I created the index to search for similarity.

I know that I can create the index later.

The question is if the results of similarity search can potentially be different (different embeddings will he returned) if I have have thousands of embeddings and “I first create index then insert data” or “first insert data then create index”.

I think I wasn’t very clear in my answer, I apologize. The fact of creating an index later, in the same way I created it in my RAG application, does not affect the final result. In both cases, the result will be the same.

1 Like