Improving create Index performance

We have a collection with about 121 million documents that results in following stats.

{
  "ns" : "my_collection",
  "size" : 499941088660,
  "count" : 121981837,
  "avgObjSize" : 4098,
  "storageSize" : 178798247936,
  "capped" : false,
  "nindexes" : 1,
  "totalIndexSize" : 1713725440,
  "indexSizes" : {
    "_id_" : 1713725440
  },
  "ok" : 1
}

We are trying to create an index on an existing field that all documents have. For trying this out, we have created an isolated mongo instance with following specs:

VM Spec: n1-highmem-8 (8 vCPUs, 52 GB memory) -- Google Cloud
Disk: 500GB SSD with random IOPS limit at 15000/15000 (read & write each) and max throughput at 240 MB/S. 

Index creation seems to running very slow (by our estimate it would take to a day or two).

We tried different values for maxIndexBuildMemoryUsageMegabytes (default 500, 800, 1000, 5000, 10000, 30000) But this didn’t seem to affect the speed. (Even with high values here, VM still had free memory apart from used + cache, so we believe it didn’t cause insufficient working set memory leading to higher disk I/O).

Checking on iotop, we noticed reads happening (avg 10MB/S) but almost no writes.

Are there any other parameters we should be looking at?

Hi @Shivaprasad_Bhat,

We recommend creating index on a large collection using the rolling maintenance manner:
https://docs.mongodb.com/manual/core/index-creation/

With this approach you should be able to have minimal impact on your production and create the index with default speed. As long as your replication window is sufficient you should be good doing this build one node at a time.

Best
Pavel

Hi @Pavel_Duchovny,

We did go through this documentation. But one complication we have is that an index with unique constraint on the same field already exists. The new index we want to create is a partial index (i.e., index only those documents where the target field exists thus allowing multiple documents to not have the field or have null value). Since we can’t have 2 indices at once, we have to drop the existing index first which would affect the production traffic. This prevents us from using background indexing on primary.

Other approach from Mongo documentation is to take one secondary out of cluster at a time, run foreground index creation and put it back into cluster. Finally change primary to secondary and do the same thing. Since we have about 8 secondaries, this will require lot of manual intervention and is cumbersome.

We have come up with an alternate approach to solve this based on following facts:

  1. Mongo allows 2 indices on same field to exist if they are of different order (1 vs -1).
  2. The order of the index does not matter to Mongo for single field indices.
  3. The sort order of the target field does not matter in our use case.

Approach:

  1. Keep the current unique index (asc order) which continues serving production traffic.
  2. Start creating a partial index with desc order in background mode on primary. Wait till this completes and secondaries are in sync.
  3. Drop the asc index.

Let me know if you see any problems with this approach.

Regards,
Shivaprasad Bhat

Hi @Shivaprasad_Bhat,

That sounds like a good plan. But be aware that index builds on secondaries start after the Primary build.

Btw, you can always add a dummy field to your index to create more “duplicate” indexes.

Best
Pavel