In MongoDB 6.0.3, multiple "Not-primary error while processing 'find' operation on 'XXX' database via fire-and-forget command execution."

MBO · January 19, 2023, 9:05am

Hi,
We’ve upgraded our stack of clusters to MongoDB Community 6.0.3.
On our test phases, we noticed that, when a node was restarting after its DBPath was cleaned, during the sync phase (STARTUP2), it throws these errors :

{"error":"NotWritablePrimary: Not-primary error while processing 'find' operation on 'rcs' database via <mark>fire</mark>-and-forget command execution."}}

After some research, it is an error that is raised because recovering nodes may be part of an election (as per this SERVER-70510). This topic seems to mention there is a choice that need to be made.

We found out that these errors were masked in 6.2.0rc0 (see SERVER-60553), but only “masked”.

We don’t see any of these errors on our previous stacks (4 and 5)
Do these errors mean there is somewhere a client that fails to read because it was instructed to read on a recovering node ? Or were they just extra logs on an existing behaviour ?
Maybe the same occur on 4 / 5 but we just don’t see it ?

Thanks !

tapiocaPENGUIN · January 19, 2023, 2:32pm

What is the topology of your cluster, how many nodes are in the replica set etc, it sounds like you don’t have an active primary. Did you check all the nodes to make sure you do have a healthy primary?

MBO · January 19, 2023, 2:45pm

Hello
the RS is a 3-node. At the recovery time, there is a master (we monitor the RS.status() through Grafana).
The error is thrown on a secondary node while being in recovery, hence the weirdness.

chris · January 19, 2023, 6:41pm

Hi @MBO, great research on the error. Looking at the jira it appears the fire-and-forget operation are due to mirrored reads.

As these are primarily to have a partially warm cache on primary candidates the responses(or lack of in this case) are never waited for and won’t impact your actual clients.

Mirrored reads have been in since 4.4. Also this comment mentions 4.4 so just hadn’t been seen I guess.

MBO · January 20, 2023, 7:07am

Hi Chris
thanks a lot. Indeed, I misunderstood this mirrored read feature. Now that makes a whole lot of sense. The issue (non critical) aims at not considering recovering nodes as eligible, in order to avoid being mirrored by the primary as they are not ready for this “wam-up”.
Will try with mirrorRead off, just to check the issues disappear, but in any case, these are, as you mentioned, harmless warnings.

PS : This cluster was previously on 4.0, therefore, no mirrored read support. But, on the other hand, one of our clusters was on 5.0, but we never saw these errors (and we still don’t…). Maybe this cluster is much less active, and there is not enough time during the recovery process for the primary to send these fire-and-forget ?

MBO · January 20, 2023, 9:10am

By the way, in a setup where we use secondaryRead preferences, does it benefit from mirrored reads, as they should be warming up data already ?

chris · January 20, 2023, 1:55pm

The default is to mirror 0.01. So there could be some benefit. But that would depend if there are overlaps on the reads on the primary and the one occurring on the secondaries.

system · April 4, 2023, 6:38am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.