Sharding or not?

Cassiano_Cussioli_Si · June 23, 2020, 11:19pm

I need a database to handle about 3.5 million new devices positions every day, today I’m using mysql, but the having performance issues on reports with many positons, I’m considering to change for a mongodb database, should I consider use sharding or not?

michael_hoeller · June 24, 2020, 6:27am

Hello @Cassiano_Cussioli_Si welcome to the community!

If sharding is appropriate for your new deployment, can’t be answered at this point, there are many points to consider. If your dataset fits on a single server, you should begin with an unsharded deployment, while your dataset is small sharding may provides little advantage.

The story will start to think about a good schema design. In case you move the SQL normalized Data Model 1:1 to MongoDB you will not have much fun or benefit.
You can find further information on the Transitioning from Relational Databases to MongoDB in the linked blog post. Please note also the links at the bottom of this post, and the referenced migration guide.

Since you mention that the current main problem is reporting run time. One option with MongoDB is to have a replicaset with one, or more hidden secondary servers which are dedicated for special tasks e.g. analytics, searches, … the key factor with them is that they can have individual indexes to support their special tasks. This as a starter, please follow this link to find out more about hidden Replicat Set Members.

Sharding is a method to horizontal scale your load when vertical scaling gets inefficient.

Horizontal Scaling involves dividing the system dataset and load over multiple servers, adding additional servers to increase capacity as required. While the overall speed or capacity of a single machine may not be high, each machine handles a subset of the overall workload, potentially providing better efficiency than a single high-speed high-capacity server. The other side of the coin is increased complexity.

I’d recommend to start without sharding, find and tune your schema and work from there.

Shadring provides advantages, when your use case fits, as like:

Reads / Writes: MongoDB distributes the read and write workload across the shards in the sharded cluster, allowing each shard to process a subset of cluster operations. For queries that include the shard key or the prefix of a compound shard key, mongos (the routing server) can target the query at a specific shard. These targeted operations are generally more efficient than broadcasting to every shard in the cluster.
Storage Capacity: Sharding distributes data across the shards in the cluster, allowing each shard to contain a subset of the total cluster data. As the data set grows, additional shards increase the storage capacity of the cluster.
High Availability: A sharded cluster can continue to perform partial read / write operations even if one or more shards are unavailable. While the subset of data on the unavailable shards cannot be accessed during the downtime, reads or writes directed at the available shards can still succeed.

Hope that helps
Michael