Mongodb 5.0 onwards: oplog grows beyond configured value and causing disk Full during failover

venkataraman_r · December 8, 2023, 11:12pm

Dear Team,

We are currently in the process of evaluating the upgrade of MongoDB from version 4.4 to 5.0. However, we have encountered a critical and unexpected issue in the 5.0 version that we consider a blocker.

Our setup involves a replica set with PSA (Primary, Secondary, Arbiter): two data-bearing nodes serving as Primary and Secondary, along with one Arbiter. The problem arises when the secondary node goes down; we observe that the primary oplog grows beyond the configured limit. Our typical oplogSize is set to 5GB. Upon bringing the secondary node back up, the oplog size is reduced.

This behavior is specific to MongoDB 5.0 and was not observed in version 4.4. In scenarios with heavy workloads, the oplog fills rapidly, leading to disk space issues.

To reproduce the issue:

Set up a PSA configuration.
Run a workload.
Stop the secondary node.
Observe the oplog.rs collection size and its .wt file in the dbPath.
We are seeking a native solution to prevent this oplog growth. Are there any configuration parameters that we might be overlooking? The MongoDB documentation mentions that “The oplog can grow past its configured size limit to avoid deleting the majority commit point,” but the reasoning behind this is not clear. We currently have flowControl enabled (default), and disabling it results in exponential growth.

Our question is why MongoDB stores oplog beyond its configured size when the secondary is not available or reachable. If the secondary is later deemed too stale, a full sync could be performed.

Attached are the relevant configurations and logs from our test (Note: Oplog size is set to 100MB for a quick test, but the behavior persists regardless).