Recover deleted documents?

Hi all,

I have a collection, where some documents were deleted by accident. Despite having daily backups, this deletion happened before the backups that are available (more than 10 days ago).
Since the database does not change much or frequently, is it possible these deleted documents are still somewhere in the database files?
Is there any way to check/find them?

My mongod version is:

db version v4.2.8
git version: 43d25964249164d76d5e04dd6cf38f6111e21f5f
OpenSSL version: OpenSSL 1.1.1l FIPS 24 Aug 2021
allocator: tcmalloc
modules: none
build environment:
distmod: rhel80
distarch: x86_64
target_arch: x86_64

Hi @Georgios_Petasis and welcome in the MongoDB Community :muscle: !

If you have a standalone mongod, then no, it’s lost forever.
If you have a Replica Set (even a single node), then it means all the write operations are written to the Oplog.

The Oplog is a system collection that has a limited size (capped collection) and overwrite the oldest entries as new ones arrive.

You can retrieve information about your Oplog with the command:

> db.getReplicationInfo()
{
  logSizeMB: 8423,
  usedMB: 0.01,
  timeDiff: 360,
  timeDiffHours: 0.1,
  tFirst: 'Wed Sep 29 2021 14:35:23 GMT+0000 (Coordinated Universal Time)',
  tLast: 'Wed Sep 29 2021 14:41:23 GMT+0000 (Coordinated Universal Time)',
  now: 'Wed Sep 29 2021 14:41:30 GMT+0000 (Coordinated Universal Time)'
}

Depending how much write operations are performed on the cluster, the oplog time window can be large or small. It’s a good practice to have a confortable size.

If you inserted these documents recently and if you have a large oplog windows, they are still in the oplog.

See this little example:

test [direct: primary] test> db.coll.insertMany([{name: "Max"}, {name: "Alex"}, {name: "Claire"}])
{
  acknowledged: true,
  insertedIds: {
    '0': ObjectId("61547bd83bbc8bc533a5c784"),
    '1': ObjectId("61547bd83bbc8bc533a5c785"),
    '2': ObjectId("61547bd83bbc8bc533a5c786")
  }
}
test [direct: primary] test> db.coll.deleteMany({})
{ acknowledged: true, deletedCount: 3 }
test [direct: primary] test> use local
switched to db local
test [direct: primary] local> db.oplog.rs.find({op: 'i', ns: 'test.coll'}, {o:1})
[
  { o: { _id: ObjectId("61547bd83bbc8bc533a5c784"), name: 'Max' } },
  { o: { _id: ObjectId("61547bd83bbc8bc533a5c785"), name: 'Alex' } },
  { o: { _id: ObjectId("61547bd83bbc8bc533a5c786"), name: 'Claire' } }
]

With an aggregation pipeline, I can even restore them into the original collection:

test [direct: primary] local> db.oplog.rs.aggregate([{$match: {op: 'i', ns: 'test.coll'}},{$replaceRoot: {newRoot: '$o'}}, { $merge: { into: {db: "test", coll: "coll"}, on: "_id", whenMatched: "replace", whenNotMatched: "insert" } }])

test [direct: primary] local> use test 
switched to db test
test [direct: primary] test> db.coll.find()
[
  { _id: ObjectId("61547bd83bbc8bc533a5c784"), name: 'Max' },
  { _id: ObjectId("61547bd83bbc8bc533a5c785"), name: 'Alex' },
  { _id: ObjectId("61547bd83bbc8bc533a5c786"), name: 'Claire' }
]

Cheers,
Maxime.

2 Likes

Hi Mazime,

You solution is very nice i was tested my project in DEV servers it is worked successful.
but next i move to my production servers duting recover as you mention steps following
i need you help

  1. Is applciation should be stop or not need ?
  2. data size exproxmatly more than 10 GB
  3. Is this above steps working in Shard server ? if not then please let me know how to recover deleted documents in shard servers

i am waiting you reply…

Thanks,
Srihari

The solution I explained above is NOT something you want to use in a production environment on a regular basis. It must be considered as a last resort action when nothing else is suitable (for example a full restore of a daily backup).

You can’t trust this solution to work each time because the oplog is a capped collection and old documents will disappear from it eventually.

For sharded clusters, you’ll have to apply the same method “locally” on each shard because mongos can’t access the system local database. Each shard has its own oplog completely independent from the other shards.

Again, to me, this is an extreme mesure that should never be used. When you remove a document in MongoDB, you should consider that it’s gone for good. If recovering old docs is part of your requirements, I would use another strategy like “soft deletes” (i.e. just set a boolean {deleted:true} and use it to filter with an index).

Cheers,
Maxime.

1 Like

How to use “soft deletes” (i.e. just set a boolean {deleted:true} and use it to filter with an index). ?

can you please brief explain with example

Thanks,
Srihari

I believe what Max is talking about is instead of actually deleting the document(s), you would instead add a field to the document called deleted with a value of true. While this might work in some cases, it could lead to a collection growing to larger sizes. I could however be misunderstanding what he is suggesting.

2 Likes

Nope that’s it @Doug_Duncan !

Replace delete operation with update $set {deleted:true}.
And find operation should now include something like $exists deleted false to avoid including “soft” deleted documents unless you actually want to access these “deleted” docs. Then you can find and filter on {deleted:true}.

But @Doug_Duncan is also correct that this can lead to collections infinitely growing in size and an additional “deleted” field in all the indexes to support the queries (so more RAM).

Every now and then you will also want to actually delete the docs for real once they have been soft deleted for long enough.

For this, I would suggest using a TTL index on another additional field {deletedAt: new Date()} which would be set when the deleted field is set and it would actually delete automatically for real this time the docs after X seconds.

There is a trade-off for sure to consider.

Cheers,
Maxime.

2 Likes