Recover deleted documents?

Hi all,

I have a collection, where some documents were deleted by accident. Despite having daily backups, this deletion happened before the backups that are available (more than 10 days ago).
Since the database does not change much or frequently, is it possible these deleted documents are still somewhere in the database files?
Is there any way to check/find them?

1 Like

My mongod version is:

db version v4.2.8
git version: 43d25964249164d76d5e04dd6cf38f6111e21f5f
OpenSSL version: OpenSSL 1.1.1l FIPS 24 Aug 2021
allocator: tcmalloc
modules: none
build environment:
distmod: rhel80
distarch: x86_64
target_arch: x86_64

Hi @Georgios_Petasis and welcome in the MongoDB Community :muscle: !

If you have a standalone mongod, then no, it’s lost forever.
If you have a Replica Set (even a single node), then it means all the write operations are written to the Oplog.

The Oplog is a system collection that has a limited size (capped collection) and overwrite the oldest entries as new ones arrive.

You can retrieve information about your Oplog with the command:

> db.getReplicationInfo()
{
  logSizeMB: 8423,
  usedMB: 0.01,
  timeDiff: 360,
  timeDiffHours: 0.1,
  tFirst: 'Wed Sep 29 2021 14:35:23 GMT+0000 (Coordinated Universal Time)',
  tLast: 'Wed Sep 29 2021 14:41:23 GMT+0000 (Coordinated Universal Time)',
  now: 'Wed Sep 29 2021 14:41:30 GMT+0000 (Coordinated Universal Time)'
}

Depending how much write operations are performed on the cluster, the oplog time window can be large or small. It’s a good practice to have a confortable size.

If you inserted these documents recently and if you have a large oplog windows, they are still in the oplog.

See this little example:

test [direct: primary] test> db.coll.insertMany([{name: "Max"}, {name: "Alex"}, {name: "Claire"}])
{
  acknowledged: true,
  insertedIds: {
    '0': ObjectId("61547bd83bbc8bc533a5c784"),
    '1': ObjectId("61547bd83bbc8bc533a5c785"),
    '2': ObjectId("61547bd83bbc8bc533a5c786")
  }
}
test [direct: primary] test> db.coll.deleteMany({})
{ acknowledged: true, deletedCount: 3 }
test [direct: primary] test> use local
switched to db local
test [direct: primary] local> db.oplog.rs.find({op: 'i', ns: 'test.coll'}, {o:1})
[
  { o: { _id: ObjectId("61547bd83bbc8bc533a5c784"), name: 'Max' } },
  { o: { _id: ObjectId("61547bd83bbc8bc533a5c785"), name: 'Alex' } },
  { o: { _id: ObjectId("61547bd83bbc8bc533a5c786"), name: 'Claire' } }
]

With an aggregation pipeline, I can even restore them into the original collection:

test [direct: primary] local> db.oplog.rs.aggregate([{$match: {op: 'i', ns: 'test.coll'}},{$replaceRoot: {newRoot: '$o'}}, { $merge: { into: {db: "test", coll: "coll"}, on: "_id", whenMatched: "replace", whenNotMatched: "insert" } }])

test [direct: primary] local> use test 
switched to db test
test [direct: primary] test> db.coll.find()
[
  { _id: ObjectId("61547bd83bbc8bc533a5c784"), name: 'Max' },
  { _id: ObjectId("61547bd83bbc8bc533a5c785"), name: 'Alex' },
  { _id: ObjectId("61547bd83bbc8bc533a5c786"), name: 'Claire' }
]

Cheers,
Maxime.

2 Likes

Hi Mazime,

You solution is very nice i was tested my project in DEV servers it is worked successful.
but next i move to my production servers duting recover as you mention steps following
i need you help

  1. Is applciation should be stop or not need ?
  2. data size exproxmatly more than 10 GB
  3. Is this above steps working in Shard server ? if not then please let me know how to recover deleted documents in shard servers

i am waiting you reply…

Thanks,
Srihari

The solution I explained above is NOT something you want to use in a production environment on a regular basis. It must be considered as a last resort action when nothing else is suitable (for example a full restore of a daily backup).

You can’t trust this solution to work each time because the oplog is a capped collection and old documents will disappear from it eventually.

For sharded clusters, you’ll have to apply the same method “locally” on each shard because mongos can’t access the system local database. Each shard has its own oplog completely independent from the other shards.

Again, to me, this is an extreme mesure that should never be used. When you remove a document in MongoDB, you should consider that it’s gone for good. If recovering old docs is part of your requirements, I would use another strategy like “soft deletes” (i.e. just set a boolean {deleted:true} and use it to filter with an index).

Cheers,
Maxime.

2 Likes

How to use “soft deletes” (i.e. just set a boolean {deleted:true} and use it to filter with an index). ?

can you please brief explain with example

Thanks,
Srihari

I believe what Max is talking about is instead of actually deleting the document(s), you would instead add a field to the document called deleted with a value of true. While this might work in some cases, it could lead to a collection growing to larger sizes. I could however be misunderstanding what he is suggesting.

2 Likes

Nope that’s it @Doug_Duncan !

Replace delete operation with update $set {deleted:true}.
And find operation should now include something like $exists deleted false to avoid including “soft” deleted documents unless you actually want to access these “deleted” docs. Then you can find and filter on {deleted:true}.

But @Doug_Duncan is also correct that this can lead to collections infinitely growing in size and an additional “deleted” field in all the indexes to support the queries (so more RAM).

Every now and then you will also want to actually delete the docs for real once they have been soft deleted for long enough.

For this, I would suggest using a TTL index on another additional field {deletedAt: new Date()} which would be set when the deleted field is set and it would actually delete automatically for real this time the docs after X seconds.

There is a trade-off for sure to consider.

Cheers,
Maxime.

2 Likes

Hi,

Do you know if there is anyway to recover document deleted/purged by a TTL index ?
I had done some testing using oplog, but I couldn’t find any resolution and I don’t think it can be done.
Can you confirm ?

Thanks !
Sally

Hi @Yook_20450,

Sorry, I’m just reading this now.
When a document is deleted from MongoDB (by a TTL or not), it’s the same result in the oplog. I provided an example in this topic above to explain how a document could be “saved” using the oplog, but it would only work if the oplog is large enough so it still contains the entry that created this doc. If that’s the case, then it will also contain all the following updates that may have occurred to this doc.
Else it’s lost if you don’t have a backup. :frowning:

Cheers,
Maxime.

Single document was deleted , we need recover that document restore into exited collection.
We do not have backup

How to do step by step explain but oplog collection it is placed

db.oplog.rs.find({“ns”:“empdb.emptbl”,“op”:“d”,“o”:{“_id” :ObjectId(“64ea1ce2b1084f3d73a33001”)}}).sort({$natural:1}).limit(10).pretty()
{
“op” : “d”,
“ns” : “empdb.emptbl”,
“ui” : UUID(“c3387486-31b7-4398-9688-9274fd585315”),
“o” : {
“_id” : ObjectId(“64ea1ce2b1084f3d73a33001”)
},
“ts” : Timestamp(1696357679, 5),
“t” : NumberLong(3),
“v” : NumberLong(2),
“wall” : ISODate(“2023-10-03T18:27:59.868Z”)
}

Hi @Srihari_Mamidala,

If it’s just that one document, I would just re-insert manually:

db.coll.insertOne({
  "_id" : ObjectId("64ea1ce2b1084f3d73a33001"),
  "ts" : Timestamp(1696357679, 5),
  "t" : NumberLong("3"),
  "v" : NumberLong("2"),
  "wall" : ISODate("2023-10-03T18:27:59.868Z")
})

Else the pipeline I provided above will work just fine with the right filter.

But again: DO NOT use this method to recover documents. It’s a last resort method.

Also here you are just recovering the document when it was inserted. You are not recovering the updates.

Cheers,
Maxime.