Horizontal scaling strategy with 1000s of shards

My app has a User collection. Each document in the collection averages about .04 MB. At worst case, a document may slightly exceed .1 MB. Needless to say, these are small documents. However, each document could potentially have over 1000 fields.

There are 3 types of read queries on this User collection:

  1. findOne by _id
  2. findOne by email
  3. findMany users by a set of filters coming from an arbitrary combination of the 1000 fields.

The third type of query is slow. So the plan is to divide and conquer, ie partition the User collection into shards, where each shard will have on average of 10,000 documents. So if my app scales to 100 million users, there will be 10,000 shards each having about 10,000 documents for this User collection. To ensure even distribution, the _id will be used as the shard key. The only time when a findOne by email is used is during user log in, which happens infrequently. The findOne by _id will be targeted.

The third type of query, findMany by arbitrary combination of filters, will be scatter and gather. And that is precisely the intention. The only way to scale the 3rd type of query is to divide and conquer.

Questions:

  1. does this strategy seem correct? Am I on the right or wrong path. There are many shards, 10,000. However, each shard only contains 10,000 small documents.
  2. Will the third type of query, findMany by arbitrary combination of 1000 filters, scale as I explain it? What am I missing if it won’t scale?

you will get trouble in maintaining those many shards, a lot. Unless you automate most (if not all) admin tasks. shards come and go, machines down, network too slow, many more.

I wouldn’t use this approach. Some google search show same thing to me: you can’t optimize everything. e.g. mysql - Indexes for custom filtered queries - Stack Overflow

Just do best you can, and leave others as-is.