MongoDB replica set SECONDARY node missing data

Yanfei_Wu · April 10, 2023, 3:44pm

I resynchronized the data of a node through the internal mechanism of the MongoDB replica set. After the synchronization was completed, I found that the total number of data in some collections was not equal. I checked the node status through rs.status(). The status is already SECONDARY, and the optime is also consistent with the PRIMARY node.

I have newly synchronized a SECONDARY node, and the total number of data in the collection of this node is consistent with that of the PRIMARY node.

But after a long time, the node data synchronized for the first time, the total number of data in some collections is still inconsistent with the PRIMARY node.

I haven’t found any similar phenomena in the official documents.

I would like to know what are the possible reasons for this problem?

tapiocaPENGUIN · April 10, 2023, 7:25pm

What is the result if you do the commands from the primary

rs.printSecondaryReplicationInfo()

rs.printReplicationInfo()

Yanfei_Wu · April 11, 2023, 3:44am

Thank you for your reply, the image below is the result of executing the command.

1681184559617

For this node at the end of IP 249, data is missing in some collections.

Yanfei_Wu · April 11, 2023, 1:55pm

Hello, can you help me analyze what may be the cause?
Thanks so much.

chris · April 11, 2023, 9:33pm

How are you checking that data is missing in some collections ?

Yanfei_Wu · April 12, 2023, 2:21am

db.stats()

db.count()

Yanfei_Wu · April 12, 2023, 2:24am

for example:

The A collection of the PRIMARY node, through db.stats(), there are 1000 records

And I checked the A collection through db.stats() on the node for the first synchronization to see that there are only 900 records

tapiocaPENGUIN · April 12, 2023, 2:44pm

Did you do an initial sync or was the MongoDB process down and then you brought it back up and it synced the data?

chris · April 13, 2023, 2:11am

I notice your oplog is ~24GB but you only have 3.39 hours of headroom.

Are you doing a lot of inserts to this cluster? It seems busy.

Could a few seconds equal a discrepancy of 1000 documents?

If the write concern is {w: "majority"} it could be the other members that are more up to date, if the write concern is {w:1} then both secondaries could be behind if there is a corresponding high insert rate.

If you think this is unlikely then it might be prudent to create a bug report on https://jira.mongodb.org

tapiocaPENGUIN · April 13, 2023, 1:51pm

I was thinking the same thing the oplog headroom is very small, I wonder if the oplog was too small and that’s why it didn’t get all the documents when he was doing the sync? But

Yanfei_Wu · April 14, 2023, 1:42pm

Yes, our MongoDB replica cluster writes a lot of data.

The write level of the program is: {w: “majority”}

Initially, this MongoDB replica cluster had three nodes, but two of them went down, and I removed the two down nodes from the cluster.
Then I newly deployed a node of the same version, added it to the original replica set, and synchronized data through the internal synchronization mechanism of the MongoDB replica set. When the node role is normal and the optime is consistent through rs.status(), I observe the newly synchronized node There are missing data in the collection.
I later deployed another node to join the cluster. After the latest node synchronization is completed, this node has the same data as the master node. So far, among the three nodes, the node data for the first synchronization is still missing collection records.

So it is very strange why this phenomenon occurs.

Yanfei_Wu · April 14, 2023, 1:43pm

During the synchronization of the first node, business reading and writing are suspended.
Only the new node is synchronizing the data of the primary node through the internal mechanism of the replica set.