At how many filter conditions in the $match stage will a query become unfeasible?

Big_Cat_Public_Safety_Act · December 23, 2022, 7:43pm

My app has a user collection that has 100 fields plus another field that is an array of subdocuments. Each subdocument will again have 100 fields. Each user has up to 100 of these subdocuments.

Each time a user logs in, a query must be run against this collection, where in the $match stage, potentially 200 filter conditions may be specified to query the 200 fields of each document (including the fields of the subdocument). It is not predictable what combination of these conditions will be used on each query.

All fields are combination of text, number, and boolean.

Is this APP feasible?
How many users can it scale up, approximately?
If it is feasible, are compound indexes the ultimate solution for these queries?

steevej · December 23, 2022, 8:33pm

This looks more or less like

Nick · December 27, 2022, 8:12am

These kinds of questions are impossible to answer. What you’re describing is far too vague and has a huge amount of unpredictable variables.

Ultimately, I would say, anything is possible, but as an app grows and user numbers grow, you will always have to adapt. Twitter, Facebook, etc. were not built from the start to handle the amount of data they handle today.

Do some testing with fake data to see how long your queries take to run on one type of instance. Experiment with different indexes. Try different instance types (e.g. on MongoDB Atlas). Think about when do these queries really have to run. Consider running long queries ahead of time (e.g. daily) and cache the result so it is instantly available when the user logs in.

I’d say, if your data schema fits well into MongoDB, go for it. A great advantage is that you can make changes to your schema easily at any time.

Stennie_X · December 28, 2022, 6:26am

Hi @Big_Cat_Public_Safety_Act,

As noted in earlier replies, this seems related to some of your other discussion topics although you have extra questions here.

Scalability and feasibility will depend on many factors including your schema design, application design, indexes, deployment resources, workload, performance expectations, and funding. The best way to estimate would be generating some data and workload in a representative test environment.

There are different dimensions to scaling (performance scale, cluster scale, data scale) and you can see some examples at Do Things Big with MongoDB at Scale.

As @Nick notes, Twitter and Facebook weren’t built from the start to handle the user base they have today. Both have evolved into very large application platforms and companies with 1000s of engineers and millions or billions of users.

As per #1, any estimate is going to depend on many factors and this question isn’t directly answerable. The estimated number of users will also vary depending on what those users are doing, and when. An application with 10,000 daily users distributed globally could mean anywhere from 10s to 100s or 1000s of concurrent users depending on session durations, time zones, and how they interact with your app.

I recommend reviewing the MongoDB Schema Design Patterns to see which might apply to your application and use cases.

For example, the Attribute Pattern would be helpful for the variety of fields you are planning, including unpredictable field names.

If you have more ambitious search requirements, Atlas Search has a rich set of search features and operators.

If you are looking to optimise some specific use cases, I suggest starting a discussion with more concrete details including example documents with your proposed schema, common queries, and any concerns or findings you have so far.

Regards,
Stennie