Chunks migrated but dataSize didn't reduce on previous shards

Hi,

Let us assume, Shard A and Shard B each hold 200GB of dataSize, (85/100 storage used)
TurnedOff balancer and added Shard C to the cluster and TurnedOn balancer to migrate chunks.

If chunks migrate from shard A, B (old) holding 3000 chunks together without jumbo chunks to shard C (new) equally 1000 chunks, can I expect Shard A to reduce with dataSize from 200Gb to ~150GB at least?, so that I can perform “compact” operation to claim reUsable storage.

FYI, I am on MongoDB v4.0.20

TIA

Below explanation should understand better,

MongoDB v4.0.20

Before [ no Jumbo Chunks verified by sh.status(true) ]:
Shard A - Chunks 1500 - dataSize 200GB
Shard B - Chunks 1500 - dataSize 200GB

After adding a shard
Shard A - Chunks 1000 - dataSize ?? (Can I expect reduced dataSize?)
Shard B - Chunks 1000 - dataSize ?? (Can I expect reduced dataSize?)
Shard C - Chunks 1000 - dataSize 130GB

TIA

Hi @Dheeraj_G welcome to the community!

Although in theory the data size and to some extent the storage size should be evenly distributed between the number of shards, in practice this is difficult to determine. Every deployment is different, and the data size would depend on (off the top of my head):

  • Whether every document are of the same size or not
  • Whether the shard key have enough cardinality to allow this balance
  • Whether each of the chunks in the collection are operated on evenly (i.e. are there “hot chunks” that receive more reads/inserts/updates than others)

It should be approximately evenly distributed if the collection was ideally distributed and the workload evenly distributed as well, however in practice this is not always the case.

It also gets more complicated due to how WiredTiger actually allocates the data files physically within each shard (which could be very different on each shard). Deleting documents and compacting the database may result in space returned to the OS, but this is not a guarantee. WiredTiger’s compression features should help you with disk space conservation to some extent, should disk space conservation is important to you.

However if you are expecting your data to grow in size, I don’t think you need to run compact. The reasoning is because if you expect your data to grow, those spaces will have to be reclaimed again by WiredTiger in the future, thus resulting in no net useful work.

Best regards,
Kevin

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.