Hello everyone,
We are planning to create a new compound index on a very large collection and I would like to get some advice or hear from people with similar experience.
Environment / background:
- MongoDB version: 7.0.20
- Collection size: ~1.2 billion documents (90 days TTL)
- Index to be created: compound index on 4 fields (string + timestamp + int32 + int32)
- Cluster type: replica set (3 nodes)
- Hardware per node: 12 vCPU, 23.5 GB RAM, 2.5 TB fast SSD
The largest index we have ever built was on a collection of about 2 million documents. That build went relatively smoothly: the impact was noticeable only at the very beginning, then it continued in the background with little effect on cluster performance.
My questions are:
- With 1.2B documents and a 4-field compound index, how long can the build realistically take? Hours? Days?
- What level of impact on the cluster should we expect during the build (CPU, memory, I/O, replication lag)?
- Are there any best practices for reducing the impact? (e.g. maxIndexBuildMemoryUsageMegabytes, building on a secondary first, temporarily reducing write load, etc.). Normally we create indexes through MongoDB Compass, and so far that workflow has been smooth for smaller datasets.
- Any real-world cases / metrics from those who built indexes on billion+ document collections would be extremely helpful.
Thanks in advance for sharing your experience!