How exactly would you fix a sharded cluster with bad distribution in a production environment

mikemachr · June 20, 2021, 11:07pm

This may be a dumb question but this is something I have been asking myself a lot. I understand that choosing a bad shard key would result in an uneven distribution on the sharded cluster, and if you get to a point where you get stuck and cannot add more data, you would need to increase the chunk size and manually split the jumbo chunks, however this is only a provisional solution that would fail in the long term since the shard key is bad anyways. Is there an actual way to give this a permanent fix within the same collection, or you would need to make a copy of the whole collection in another cluster and make sure you shard correctly this time?

Tom_Boutell · May 16, 2022, 7:13pm

I realize it’s been a while, but for those reading this question today, according to the documentation you can now change the shard key as long as you’re on at least MongoDB 5.0.

Stennie_X · July 29, 2022, 1:59am

Hi @Tom_Boutell,

Thanks for adding the extra information! Starting in MongoDB 5.0, there is the option to Reshard a Collection if a suboptimal shard key was chosen.

MongoDB 4.4 also added the option to Refine a shard key by adding new field(s) as suffixes to the existing key.

If you happen to be using an older version than the above, your only approach would have been to copy into a new collection with the desired shard key. However, since the distribution of shard key values would be known for an existing collection you could speed up the process by pre-splitting chunks in an empty sharded collection before copying the data from an existing collection.

Regards,
Stennie

system · August 3, 2022, 2:00am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.