Mongorestore: opLogReplay E11000 duplicate key error

Ritesh_Kumar6 · July 21, 2023, 3:51am

I was able to resolve the issue and perform the PITR of mongo 3 node replicaset cluster.

Adding the steps in-case someone else stumbles across this post.

After restoring the T-1 EBS Snapshot of /data mount point in new EC2 instance.
…

Get the latest oplog entry in the new EC2 instance launched from T-1 Snapshot

>use local
>db.oplog.rs.find({op:"i"}).sort({$natural: -1}).limit(1);

# Sample Output for ts value
ts: Timestamp({ t: 1689748934, i: 3 }),

Take oplog dump from existing Running Mongo cluster.
Our oplLog contains entries of last 57 hours; so I do not know why we were having Duplicate key errors despite the fact the oplog should be idempotent. And replaying the oplog again and again should be fine.

mongodump -u username --authenticationDatabase=admin -h secondary-node-ip-existing-cluster -d local -c oplog.rs --query '{"ts": {"$gt": {"$timestamp": {"t": 1689748934, "i": 3}}}}' -o oplogDumpDir/

Perform renaming of oplog file

mv oplogDumpDir/local/oplog.rs.bson oplog.bson
rm -rf oplogDumpDir/local

Perform Replay of opLog on new EC2 instance until the bad transaction

mongorestore  --authenticationDatabase=admin -u root -h new-ec2-instance-ip  --oplogReplay --oplogLimit 1689837570:1 oplogDumpDir/

Wait for the completion of the process.

We verified the restoration using one document that gets updated very frequently.

Two things that we observed:
a. The transactions were replayed but the oplog entry in the recovered instance was not updated. I am not sure if this is Mongo Server behaviour or not.

b. It took nearly 12 hours just to replay the 14GB oplog on t3.large EC2 instance with /data EBS Volume of gp2 type with 1200 IOPS with no active user connections. This is too long for production environment.

While searching for solutions; we stumbled across

pbm

tool as well. It looks promising, but we did not research much into that implementation.