Mongorestore: opLogReplay E11000 duplicate key error

I was able to resolve the issue and perform the PITR of mongo 3 node replicaset cluster.

Adding the steps in-case someone else stumbles across this post.

After restoring the T-1 EBS Snapshot of /data mount point in new EC2 instance.

  1. Get the latest oplog entry in the new EC2 instance launched from T-1 Snapshot
>use local
>db.oplog.rs.find({op:"i"}).sort({$natural: -1}).limit(1);

# Sample Output for ts value
ts: Timestamp({ t: 1689748934, i: 3 }),
  1. Take oplog dump from existing Running Mongo cluster.
    Our oplLog contains entries of last 57 hours; so I do not know why we were having Duplicate key errors despite the fact the oplog should be idempotent. And replaying the oplog again and again should be fine.
mongodump -u username --authenticationDatabase=admin -h secondary-node-ip-existing-cluster -d local -c oplog.rs --query '{"ts": {"$gt": {"$timestamp": {"t": 1689748934, "i": 3}}}}' -o oplogDumpDir/
  1. Perform renaming of oplog file
mv oplogDumpDir/local/oplog.rs.bson oplog.bson
rm -rf oplogDumpDir/local
  1. Perform Replay of opLog on new EC2 instance until the bad transaction
mongorestore  --authenticationDatabase=admin -u root -h new-ec2-instance-ip  --oplogReplay --oplogLimit 1689837570:1 oplogDumpDir/
  1. Wait for the completion of the process.

We verified the restoration using one document that gets updated very frequently.

Two things that we observed:
a. The transactions were replayed but the oplog entry in the recovered instance was not updated. I am not sure if this is Mongo Server behaviour or not.

b. It took nearly 12 hours just to replay the 14GB oplog on t3.large EC2 instance with /data EBS Volume of gp2 type with 1200 IOPS with no active user connections. This is too long for production environment.

While searching for solutions; we stumbled across

pbm

tool as well. It looks promising, but we did not research much into that implementation.