Increased (6X) getmore commands on primary after upgrading 3.4 -> 3.6 -> 4.0

Mika_Aleksandroff · March 26, 2024, 10:50am

Hey,

we have a very simple mongo replication cluster setup with 3 nodes. We have our custom software using mongo backend.

Last week I upgraded (or tried, actually) the cluster first from 3.4 → 3.6 (Thursday) and then 3.6 → 4.0 (Friday morning). Later on Friday we noticed odd problem with one component of our custom software to which we couldn’t find any reason and I had to downgrade back to 3.6, which solved the issue.

The only change - besides mongod upgrade - was adding readPreference-parameter to mongodb-connect string to our software components (readPreference=secondaryPreferred) on 3.6 → 4.0 upgrade and removing it (debugging our own software) on the last jump (bindIp 0.0.0.0), which explain the blue area changes.

We have a very simple monitoring for this cluster using Zabbix and mongostat. Today I noticed that, on the primary node getmore commands jumped during the upgrades. LogLevel 1 tells me that (currently, no idea about before) that, those are 99,9% to collection oplog.rs (local.oplog.rs). The graph below represents statistics of the primary node:

So my question is:
A. Is this something adding the readPreference=secondaryPreferred option would cause?
B. Is this something a cluster upgrade would cause?
C. Some other logical explanation?

I can provide more details about upgrade/downgrade processes if that helps, but mostly I’m just wondering if this is “normal” between said versions or something that should be investigated more?

Any advice/comment would be helpful,

Thanks!

Mika_Aleksandroff · March 28, 2024, 9:00am

Hmm, can’t seem to be able to edit the original anymore… After more studying and closer examination it feels almost normal.

First of all, it’s actually only 3X more getMore-operations on the primary. 80% of those are on oplog.rs, whereas a week before it was 0%. It seems as if there is something wrong or I don’t understand something.

Also I need to correct another error regarding the readPreference setting. Opposite to what one would think, there was actually a typo in readPreference option while the blue area is close to zero. That typo was corrected on the last jump. Which is very weird as well, but most likely our own software’s erraneous behaviour.

Sorry for confusion…