Considerations for MongoDB Data Model and Shard Key Optimization

Hi team,
Existing Cluster specification:
The shard key we are using is: {“companyId”: 1, “b”: 1, “c”: 1}, and the collection name is “events”. Existing events collection has nearly 250 GB of data with three shard cluster.

We have a single collection called events that stores the records for all the available companies. However, we have noticed that the data distribution is uneven between the shards due to less cardinality on the leftmost prefix of the shard key. ex: all the chunks of company “a” reside only on a single shard. Across the shards we have an even number of chunks. but each company data is residing under a single shars

We currently have five distinct companies in the collection. There won’t be more than ten companies in future.

To address this issue, we are thinking two potential solutions:

  1. The first option is to split the base collection into multiple collections, each dedicated to one company and change the shard key to {“b”: 1, “c”: 1} for every company collection. However, this approach would require data migration from events to the respective company collection.

  2. The second option is zone-based sharding based on the “company” field and ensuring that data is evenly balanced.

between these two approaches, what is the best option? I am open for suggestions and would appreciate inputs on this matter.