If it works for 10k but not for 100k, I would tend to assume that the termination logic is implemented correctly.
Note that 100k collections is at least 200k number of files. And 1 more files per index per collection. It is quite possible that the problem is related to something taking too much time to do. The following makes a lot of sense
so is the proposed idea:
From k8s’ documentation:
Once the grace period has expired, the KILL signal is sent to any remaining processes
Receiving the KILL signal with generate
and the KILL signal is sent to process that are still running after terminationGracePeriodSeconds, like it might be the case when trying to flush and close 200k files.