Document count mismatch in a collection

Akshaya_Srinivasan · October 30, 2020, 9:14am

Hi,

I followed MongoDB documentation and took backup of MongoDB server using filesystem snapshot using lvcreate.
Post which, when I tried a restore in one of the database, for 2 collections I am seeing a mismatch in the document count. It is less than the previous value.
I used db.collection.count() to get the count value.
I am seeing a difference of almost 1k documents.
I am using MongoDB version 4.2 enterprise version.
Is it expected? What is causing the data loss here?

Thanks,
Akshaya Srinivasan

Ramachandra_Tummala · October 31, 2020, 1:58am

Try db.collection.countDocuments({})
db.collection.count() uses metadata and may not give correct count

Akshaya_Srinivasan · October 31, 2020, 6:58am

Tried with this too but same results.
I exported all the documents to csv file before and after backup. I see a clear difference in count in that too.
This looks like a dataloss issue
Also I tried the case of full backup using lvcreate and then took a dump of oplog collection from the timestamp snapshot was taken. When I restored the snapshot and dump too I am facing a mismatch in document count. Is it specific to this version?

Akshaya_Srinivasan · November 3, 2020, 6:30am

Hi,

I tried this in MongoDB 4.2 community version and I dont see this issue. Seeing this only on MongoDB 4.2 enterprise version. Is this possibly a bug?

Thanks,
Akshaya Srinivasan

kevinadi · November 4, 2020, 7:28am

Hi @Akshaya_Srinivasan

We need more details here:

What is the system’s topology? Standalone/replica set/sharded cluster?
What’s the exact 4.2 version you’re using? Latest is 4.2.10
What do you mean by “export all documents to csv before and after backup”? Do you see that the csv file after backup is shorter?
Please post the actual output of the commands you use in the mongo shell to determine the document counts. Both from the original server and the restored server.
Is the database in use during this process?

I would also like more details here:

I tried this in MongoDB 4.2 community version and I dont see this issue. Seeing this only on MongoDB 4.2 enterprise version.

Did you perform the same procedure on both servers? E.g. are you following the same backup procedure, both databases are not in use during the snapshot, etc.?

Is this possibly a bug?

Not likely. The core database code is exactly the same between the two versions. Enterprise versions adds features not related to low-level database operations, like LDAP.

Best regards,
Kevin

Akshaya_Srinivasan · November 4, 2020, 10:09am

hi Kevin,

What is the system’s topology? Standalone/replica set/sharded cluster? --> This is a sharded cluster
What’s the exact 4.2 version you’re using? Latest is 4.2.10 --> MongoDB version is
#mongod --version
db version v4.2.9
git version: 06402114114ffc5146fd4b55402c96f1dc9ec4b5
OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
allocator: tcmalloc
modules: enterprise
build environment:
distmod: rhel70
distarch: x86_64
target_arch: x86_64

[root@mongoclient /]# mongodump --version
mongodump version: r4.2.9
git version: 06402114114ffc5146fd4b55402c96f1dc9ec4b5
Go version: go1.12.17
os: linux
arch: amd64
compiler: gc

[root@mongoclient /]# mongorestore --version
mongorestore version: r4.2.9
git version: 06402114114ffc5146fd4b55402c96f1dc9ec4b5
Go version: go1.12.17
os: linux
arch: amd64
compiler: gc

What do you mean by “export all documents to csv before and after backup”? Do you see that the csv file after backup is shorter? --> Yes, when I compare the CSV file before backup and after restore, data is missing. Few records are lost.
Please post the actual output of the commands you use in the mongo shell to determine the document counts. Both from the original server and the restored server. --> db.collection_name.countDocuments({})
Is the database in use during this process? --> There are no transactions happening during backup. But the servers are up and running.
I take a full backup using file system snapshot, and then dump the oplog.rs collection. During restore I followed the documentation, full restore using file system snapshot goes fine. Then I do oplogReplay of the dumps taken on the respective shards. With this not all data is restored and few documents goes missing. This is not the case with MongoDB 4.2 community version. Data is intact after full+oplog restore.

Did you perform the same procedure on both servers? E.g. are you following the same backup procedure, both databases are not in use during the snapshot, etc.? --> Yes

Thanks,
Akshaya Srinivasan

kevinadi · November 6, 2020, 3:29am

Hi @Akshaya_Srinivasan

Since your deployment is a sharded cluster, did you follow the procedure in Back Up a Sharded Cluster with File System Snapshots? Note that due to its nature, a sharded cluster backup needs to be taken at the same time across all part of the cluster (individual shards, config servers), otherwise you’ll see inconsistencies.

If the cluster is not locked down during the backup process (e.g. balancer still active, writes still going in, etc.) then the backup will not be a point in time snapshot.

There are more resources in Backup and Restore Sharded Clusters, including restore procedures.

Best regards,
Kevin

Akshaya_Srinivasan · November 10, 2020, 8:43am

Hi @kevinadi,

Thank you so much. Identified the issue. It was due to oplog collection roll over.
Free space in the disk was less when compared to the documents inserted and the oplog collection was rolled over. This caused document mismatch issue during restore.
When I resized oplog collection, there was no document mismatch issue.

Thanks,
Akshaya Srinivasan

system · November 15, 2020, 8:43am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.