I’m curious to understand what happens to indexes after a chunk is moved. It appears to me that when a chunk is moved from one replica set to another a full re-index occurs for all of the source collection indexes. Is this true? If so, are there options that can be used to throttle the chunk movement, like not moving any chucks until the indexing is complete?
In my particular case it appears that indexing after moving a chunk is flooding my disk with activity and crushing the performance.
Let me explain.
I’ve been working with a standalone instance and I’m now working to move it to two sharded replica sets. I’ve successfully moved the database to a replica set of three data nodes and a single shard. I then created an empty three node replica set and added it as a second shard. I’ve enabled sharding on the database and one by one I am sharding the collections. The small collections balanced without issue but I’m running into balancing issues when sharding the larger collections.
It seems that after a chunk is moved from the source RS to the target RS, each node in the source RS performs a significant amount of indexing work, so much so that it overwhelms the disk drive which will peg at 100% activity and a disk queue of around 10. I believe it is indexing activity because it the files being written to are index files. The data disk for each node is four 10K SAS drives in a RAID 10 array. CPU utilization hovers around 40% and memory around 50%.
I expect this is resulting in excessive lag and causing the migrations to begin failing.
This issue seems to build over time. What I mean by that is that I can disable balancing on the collection and let the indexing work finish. If I then enable balancing on the collection, the indexing load after the first chunk is moved isn’t too bad, but after 6 or 7 chunks have been moved the load on the drives gets so bad that the balancing begins to throw ‘aborted’ errors.