MongoDB Crashes on Duplicates

Hi All,

This if my first post, so apologies if I missed something or wrote in the wrong place.

I use MongoDB 4.0.9 and experienced a weird issue.
I needed to create a unique index. Since this was a big collection, I removed the node from the cluster, built the index and added it back to the cluster once the index build was compelte. A few hours after it caught up with the Primary node, the server crashed because of the following error:
Unique index cursor seeing multiple records for key { : “1d7e75c2-6493-4fac-9d6c-116c40426991”, : “a0a4132e-efb3-4b47-b0e1-726cf9085e1f”, : “en” } in index instanceId_1_tr
anslationId_1_language_1_unique
2021-08-04T11:16:18.610+0000 F - [conn9894] Fatal Assertion 28608 at src/mongo/db/storage/wiredtiger/wiredtiger_index.cpp 1232
2021-08-04T11:16:18.610+0000 F - [conn9894]

***aborting after fassert() failure

I know that a unique index cannot be created if there are duplicate keys related to that index. Could it be that the replica received more duplicates during the sync to the Primary node and crashed when a client tried to query those keys?

Thanks!

Yep. There is a warning/tip on creating unique index. tl;dr don’t do a rolling offline index.

For building unique indexes

To create unique indexes using the following procedure, you must stop all writes to the collection during the index build. Otherwise, you may end up with inconsistent data across the replica set members.

Warning

If you cannot stop all writes to the collection, do not use the following procedure to create unique indexes.

Hi Chris,

Thanks a lot for the detailed information. I was hoping there might be another way than stopping all writes or run it online.
I guess the best method in this case is to work with the developers to prevent duplicates (hoping there won’t be any race conditions) on the app level and create the index as quickly as possible on a dedicated Primary node. Once the indexed Primary node will be up and running again, the duplication issue won’t occur.
Does this method make sense or is there a better way I might have missed?

Thanks!