Sharding to increase processing power

Ruben_Marques · October 27, 2021, 3:50pm

Hello, I am learning about the possibility of implementing sharding as a means to use many machines to parallelize the processing of data in a Mongo DB through a REST API.
My implementation currently uses only one server to query for data in its database, do some calculations and give an output to the user.
I have read that routers direct the queries for the correct shard containing the data we want, following the instructions of the config server. Are the routers also able to process the data as I am doing right now or this is just related to the operations of saving data to and getting data from the database shards?

kevinadi · November 11, 2021, 1:08am

Hi @Ruben_Marques welcome to the community.

Without knowing your goal or your current setup, I can answer some of your questions in general terms:

Yes this is generally correct, but depends on the shard key involved in the query.

Generally the mongos processes only pass through data.

Sharding allows you to parallelize querying for data. It was primarily designed for horizontal scaling, i.e. for increasing throughput for very large data sets. This is useful if you find that the bottleneck in your application is due to the very large data you have.

If the bottleneck of your process is in the data processing stage (e.g. not because the database is struggling to respond to the queries), then sharding would be of limited help. Parallelizing the data processing stage instead of the database would be more beneficial in this case. However, without knowing more of what you are planning, this is just speculation on my part.

Best regards,
Kevin

system · July 30, 2022, 10:16am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.