In MongoDB 6.0.3, multiple "Not-primary error while processing 'find' operation on 'XXX' database via fire-and-forget command execution."

Hi,
We’ve upgraded our stack of clusters to MongoDB Community 6.0.3.
On our test phases, we noticed that, when a node was restarting after its DBPath was cleaned, during the sync phase (STARTUP2), it throws these errors :

{"error":"NotWritablePrimary: Not-primary error while processing 'find' operation on 'rcs' database via <mark>fire</mark>-and-forget command execution."}}

After some research, it is an error that is raised because recovering nodes may be part of an election (as per this SERVER-70510). This topic seems to mention there is a choice that need to be made.

We found out that these errors were masked in 6.2.0rc0 (see SERVER-60553), but only “masked”.

We don’t see any of these errors on our previous stacks (4 and 5)
Do these errors mean there is somewhere a client that fails to read because it was instructed to read on a recovering node ? Or were they just extra logs on an existing behaviour ?
Maybe the same occur on 4 / 5 but we just don’t see it ?

Thanks !

What is the topology of your cluster, how many nodes are in the replica set etc, it sounds like you don’t have an active primary. Did you check all the nodes to make sure you do have a healthy primary?

Hello
the RS is a 3-node. At the recovery time, there is a master (we monitor the RS.status() through Grafana).
The error is thrown on a secondary node while being in recovery, hence the weirdness.

Hi @MBO, great research on the error. Looking at the jira it appears the fire-and-forget operation are due to mirrored reads.

As these are primarily to have a partially warm cache on primary candidates the responses(or lack of in this case) are never waited for and won’t impact your actual clients.

Mirrored reads have been in since 4.4. Also this comment mentions 4.4 so just hadn’t been seen I guess.

1 Like

Hi Chris
thanks a lot. Indeed, I misunderstood this mirrored read feature. Now that makes a whole lot of sense. The issue (non critical) aims at not considering recovering nodes as eligible, in order to avoid being mirrored by the primary as they are not ready for this “wam-up”.
Will try with mirrorRead off, just to check the issues disappear, but in any case, these are, as you mentioned, harmless warnings.

PS : This cluster was previously on 4.0, therefore, no mirrored read support. But, on the other hand, one of our clusters was on 5.0, but we never saw these errors (and we still don’t…). Maybe this cluster is much less active, and there is not enough time during the recovery process for the primary to send these fire-and-forget ?

1 Like

By the way, in a setup where we use secondaryRead preferences, does it benefit from mirrored reads, as they should be warming up data already ?

The default is to mirror 0.01. So there could be some benefit. But that would depend if there are overlaps on the reads on the primary and the one occurring on the secondaries.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.