Replication Cluster Primary Race Condition

Hi All,

I’ve situation when my mongodb cluster having two primary in sometimes, node 1 who the real primary experiencing high load and become unresponsive causing secondary node promoted as primary but in the same time node 1 not release the status as primary causing race condition when replication back to normal.

This issue having implication to data loss in our applications, because during the race condition data is going to node 2 and when node 1 back as real primary it will updated with all the data in node 1.

I will share the picture when it primary becomes two for 20 mins.

This is the first time I’m having this kind of issue, please share the workaround if anyone has experience with this.

MongoDB Version : Community 6.0.4
OS : Rocky OS 9.3

Highly appreciated for any kind of help to solving this issue.

Regards,
Hendra

Hi @Hendra_Budiawan and welcome to the community!

Are we talking about a 3-node replica set?
Because you are only talking about two nodes.

Regards

Hi @Fabio_Ramohitaj,

The arch is PSA, two data nodes and one arbiter.
Thank you for your respond.

Regards

Adding more logs from log file

{“t”:{“$date”:“2024-01-31T16:21:07.673+07:00”},“s”:“I”, “c”:“REPL”, “id”:21532, “ctx”:“BackgroundSync”,“msg”:“Incremented the rollback ID”,“attr”:{“rbid”:23}}
{“t”:{“$date”:“2024-01-31T16:21:07.673+07:00”},“s”:“I”, “c”:“STORAGE”, “id”:20658, “ctx”:“BackgroundSync”,“msg”:“Stopping index builds before rollback”}
{“t”:{“$date”:“2024-01-31T16:21:07.673+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21595, “ctx”:“BackgroundSync”,“msg”:“Waiting for all background operations to complete before starting rollback”}
{“t”:{“$date”:“2024-01-31T16:21:07.673+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21597, “ctx”:“BackgroundSync”,“msg”:“Finished waiting for background operations to complete before rollback”}
{“t”:{“$date”:“2024-01-31T16:21:07.673+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21604, “ctx”:“BackgroundSync”,“msg”:“Finding record store counts”}
{“t”:{“$date”:“2024-01-31T16:21:07.673+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21609, “ctx”:“BackgroundSync”,“msg”:“Preparing to write deleted documents to a rollback file”,“attr”:{“namespace”:“oms_document.transp
ort_orders”,“uuid”:“ce48fafb-fc16-4a18-a822-561e3c89aac3”,“file”:“/data/rollback/ce48fafb-fc16-4a18-a822-561e3c89aac3/removed.2024-01-31T09-21-07.0.bson”}}
{“t”:{“$date”:“2024-01-31T16:21:07.678+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21609, “ctx”:“BackgroundSync”,“msg”:“Preparing to write deleted documents to a rollback file”,“attr”:{“namespace”:“oms_document.schedu
les”,“uuid”:“df9100df-dfee-4b9b-b040-ef445eb84215”,“file”:“/data/rollback/df9100df-dfee-4b9b-b040-ef445eb84215/removed.2024-01-31T09-21-07.1.bson”}}
{“t”:{“$date”:“2024-01-31T16:21:07.689+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21609, “ctx”:“BackgroundSync”,“msg”:“Preparing to write deleted documents to a rollback file”,“attr”:{“namespace”:“oms_document.inboun
d_transfers”,“uuid”:“dc7018c2-d8f4-4955-bf2c-d6e871d831bb”,“file”:“/data/rollback/dc7018c2-d8f4-4955-bf2c-d6e871d831bb/removed.2024-01-31T09-21-07.2.bson”}}
{“t”:{“$date”:“2024-01-31T16:21:07.695+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21609, “ctx”:“BackgroundSync”,“msg”:“Preparing to write deleted documents to a rollback file”,“attr”:{“namespace”:“config.system.sessi
ons”,“uuid”:“a6cb1c9d-4f49-43f1-b7cd-514a48088297”,“file”:“/data/rollback/a6cb1c9d-4f49-43f1-b7cd-514a48088297/removed.2024-01-31T09-21-07.3.bson”}}
{“t”:{“$date”:“2024-01-31T16:21:07.699+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21609, “ctx”:“BackgroundSync”,“msg”:“Preparing to write deleted documents to a rollback file”,“attr”:{“namespace”:“oms_document.physic
al_inventory_documents”,“uuid”:“d73090ac-bc84-4860-9abb-466c0115aba2”,“file”:“/data/rollback/d73090ac-bc84-4860-9abb-466c0115aba2/removed.2024-01-31T09-21-07.4.bson”}}
{“t”:{“$date”:“2024-01-31T16:21:07.700+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21609, “ctx”:“BackgroundSync”,“msg”:“Preparing to write deleted documents to a rollback file”,“attr”:{“namespace”:“oms_document.transa
ctions”,“uuid”:“0c1e1de9-f166-40f1-b148-d0bd63f7965c”,“file”:“/data/rollback/0c1e1de9-f166-40f1-b148-d0bd63f7965c/removed.2024-01-31T09-21-07.5.bson”}}
{“t”:{“$date”:“2024-01-31T16:21:07.720+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21609, “ctx”:“BackgroundSync”,“msg”:“Preparing to write deleted documents to a rollback file”,“attr”:{“namespace”:“oms_document.items”
,“uuid”:“5a744c25-ed1a-4d2a-b4ba-67a1149a536d”,“file”:“/data/rollback/5a744c25-ed1a-4d2a-b4ba-67a1149a536d/removed.2024-01-31T09-21-07.6.bson”}}
{“t”:{“$date”:“2024-01-31T16:21:07.723+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21609, “ctx”:“BackgroundSync”,“msg”:“Preparing to write deleted documents to a rollback file”,“attr”:{“namespace”:“oms_document.invent
ory_request_batches”,“uuid”:“e242538a-076e-4e40-b307-a93154cf7555”,“file”:“/data/rollback/e242538a-076e-4e40-b307-a93154cf7555/removed.2024-01-31T09-21-07.7.bson”}}
{“t”:{“$date”:“2024-01-31T16:21:07.725+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21609, “ctx”:“BackgroundSync”,“msg”:“Preparing to write deleted documents to a rollback file”,“attr”:{“namespace”:“oms_document.inspec
tion_receipts”,“uuid”:“977c2456-f649-4b52-ae2c-0e19b774372e”,“file”:“/data/rollback/977c2456-f649-4b52-ae2c-0e19b774372e/removed.2024-01-31T09-21-07.8.bson”}}
{“t”:{“$date”:“2024-01-31T16:21:07.728+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21609, “ctx”:“BackgroundSync”,“msg”:“Preparing to write deleted documents to a rollback file”,“attr”:{“namespace”:“oms_document.docume
nt_sequences”,“uuid”:“12877611-0c2c-4546-8d0e-80445df758e1”,“file”:“/data/rollback/12877611-0c2c-4546-8d0e-80445df758e1/removed.2024-01-31T09-21-07.9.bson”}}
{“t”:{“$date”:“2024-01-31T16:21:07.731+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21609, “ctx”:“BackgroundSync”,“msg”:“Preparing to write deleted documents to a rollback file”,“attr”:{“namespace”:“admin.pbmLock”,“uui
d”:“20e91c30-7dcd-4f8a-b9d8-a84660388ee6”,“file”:“/data/rollback/20e91c30-7dcd-4f8a-b9d8-a84660388ee6/removed.2024-01-31T09-21-07.10.bson”}}
{“t”:{“$date”:“2024-01-31T16:21:07.733+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21609, “ctx”:“BackgroundSync”,“msg”:“Preparing to write deleted documents to a rollback file”,“attr”:{“namespace”:“admin.pbmAgents”,“u
uid”:“090809d8-18c1-4b87-b363-b431fc4eebf7”,“file”:“/data/rollback/090809d8-18c1-4b87-b363-b431fc4eebf7/removed.2024-01-31T09-21-07.11.bson”}}
{“t”:{“$date”:“2024-01-31T16:21:07.735+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21609, “ctx”:“BackgroundSync”,“msg”:“Preparing to write deleted documents to a rollback file”,“attr”:{“namespace”:“oms_document.intern
al_work_orders”,“uuid”:“bfc005ec-15c8-4fa8-a54e-a25c3e6b4a93”,“file”:“/data/rollback/bfc005ec-15c8-4fa8-a54e-a25c3e6b4a93/removed.2024-01-31T09-21-07.12.bson”}}
{“t”:{“$date”:“2024-01-31T16:21:07.749+07:00”},“s”:“I”, “c”:“NETWORK”, “id”:51800, “ctx”:“conn23”,“msg”:“client metadata”,“attr”:{“remote”:“127.0.0.1:51481”,“client”:“conn23”,“doc”:{“driver”:{“name”:“mongo-go-d
river”,“version”:“v1.11.4”},“os”:{“type”:“linux”,“architecture”:“amd64”},“platform”:“go1.19.1”,“application”:{“name”:“QAN-mongodb-profiler-/agent_id/8ce0a2d8-f409-4c28-97b1-ac2e7d10dd8b”}}}}
{“t”:{“$date”:“2024-01-31T16:21:07.749+07:00”},“s”:“I”, “c”:“NETWORK”, “id”:51800, “ctx”:“conn24”,“msg”:“client metadata”,“attr”:{“remote”:“127.0.0.1:51480”,“client”:“conn24”,“doc”:{“driver”:{“name”:“mongo-go-d
river”,“version”:“v1.11.4”},“os”:{“type”:“linux”,“architecture”:“amd64”},“platform”:“go1.19.1”,“application”:{“name”:“QAN-mongodb-profiler-/agent_id/8ce0a2d8-f409-4c28-97b1-ac2e7d10dd8b”}}}}
{“t”:{“$date”:“2024-01-31T16:21:07.755+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21609, “ctx”:“BackgroundSync”,“msg”:“Preparing to write deleted documents to a rollback file”,“attr”:{“namespace”:“oms_document.outbou
nd_returns”,“uuid”:“ff0e2a37-e956-4b3e-88f6-5dffbee6ac5b”,“file”:“/data/rollback/ff0e2a37-e956-4b3e-88f6-5dffbee6ac5b/removed.2024-01-31T09-21-07.13.bson”}}
{“t”:{“$date”:“2024-01-31T16:21:07.772+07:00”},“s”:“I”, “c”:“ROLLBACK”, “id”:21609, “ctx”:“BackgroundSync”,“msg”:“Preparing to write deleted documents to a rollback file”,“attr”:{“namespace”:"oms_document.histor

Hi @Hendra_Budiawan,
I suspect you have increased the priority of node 1 so that it is elected as primary as soon as it becomes available again, By creating this inconsistency, can you confirm?
This may be one reason that comes to mind.

However, it is strongly recommended in production to have a p-s-s architecture.

Regards

1 Like

Since the beginning node 1 we always setup higher on the priority.

Node 2 were using for the backup so we want to apps always connect to node 1 for purpose.

We dont have any issue for all the time with P-S-A until this issue raise and we want to evaluate to avoid this issue happening again in the future.

Regards

Hi @Hendra_Budiawan,
The need to necessarily have node 1 as the primary is not clear.
If you want to continue to have the things configured in this way, I suggest you increase server resources and better index the queries.
If you want professional support on this, you can contact the TSE of MongoDB.

Regards

Thank you for the suggestion, we will review our architecture again.

Regards

@Hendra_Budiawan keep me updated on the strategy you will adopt and if you have solved the problem!

Regards

Hi @Fabio_Ramohitaj , for now we will making all node having same in priority to avoid primary race condition in the future, but still we don’t still have a clue why this happening before.

Thank you for your attention.

Regards

1 Like