Is mongodb good for my use-case? Comparing with weaviate

Carlos_Pegueros · April 29, 2024, 11:58pm

Hi!

I am working on a recommendation system using LLM embeddings and I’m looking for the right database for my use-case.

I have put together a set of requirements with what I investigated on how I can fulfill them using this database, and thought of coming here to check if someone with more experience with it can help me to know if this makes sense, if I’m overlooking something, etc.

I don’t see having to support more than 500 records and maybe 100 requests per day in the mid-term, so I don’t need something with great optimizations or scaling options, but of course the cheaper the better.

So far, these are my requirements and what I have found in the docs:

I must be able to store n>=1 vector embeddings per ID OR I must be able to store 1 very large vector embedding per ID: YES
I must be able to store and retrieve metadata: YES, because vectors are stored as any other document
I must be able to do pre-filtering based on metadata: YES
I must be able to do database migrations (i.e. add/remove columns to each table): YES and I can do that with vectors too because they are stored as any other property in my collections
(Highly desirable) I want a good ts (or js) client: YES. I can use mongodb, mongoose or prisma
(Desirable) I want to do pagination after pre-filtered queries OR (Required) I must be able to retrieve every result: YES, but as I don’t expect to have that many records I am thinking of just storing the rank of every result in a separate collection and querying that directly.

To be honest, I agree with the benefits of vector search with MongoDB listed in their website, but the starting price for dedicated clusters imo is too high and vector search is not available in serverless mode. Also, I find very confusing the pricing page. For instance:

If I start with a shared free cluster, how does the vector search nodes costs relate ($0.11/hr for an S30 cluster)?
Same question, but if I start with a dedicated M10 cluster.
What are “vector search nodes” anyway?

One other “con” when comparing against Weaviate is that doing stuff like hybrid search is considerably more complex than in weaviate.

Henry_Weller · April 30, 2024, 3:23pm

Hi Carlos! First of all thank you for taking the time to evaluate Atlas Vector Search against the RCs of your application. I should be able to answer your questions:

We are in the process of rolling out a deployment options page within the vector search documentation that should make this more clear, but essentially you have the option of deploying separate cluster nodes specifically for the vector search workload. This allows for greater resource isolation, higher availability and more cost-effective scaling compared to the default state where search and database resources are coupled on the base MongoDB cluster (referred to as “coupled architecture”).

You’ve cited the low-cpu entry point for dedicated search nodes that are recommended for vector search, referred to on the pricing page as “vector search nodes,” but when testing queries prototyping you don’t need to start on dedicated search nodes and can run solely on the base MongoDB cluster, including the free M0 tier, shared M2/M5 tiers, as well as the dedicated M10+ tiers, as we support zero-downtime migration from the coupled architecture. Here is a page listing some limitations of search when running on free or shared clusters that might be helpful.

Thank you for the feedback on this example being complex. We definitely want to make sure there is an easier way of jointly considering lexical and vector search results, as we have a whole set of capabilities around lexical search that go well beyond what Weaviate supports. We have something in the works on this, and I will make sure to follow up when it’s available on this forum.

Carlos_Pegueros · April 30, 2024, 10:58pm

Thanks for the reply!

We are in the process of rolling out a deployment options page within the vector search documentation that should make this more clear

Nice! So, if I understand correctly, I can use any cluster without a dedicated vector search node and pay only the base price for the cluster, right? For instance, if I go with an M2 then, do I only have to pay $9/mo and still have access to the vector search (with its limitations)?

We have something in the works on this, and I will make sure to follow up when it’s available on this forum.

Sounds exciting! Really looking forward to it!

Henry_Weller · May 8, 2024, 8:10pm

Your understanding is entirely correct!