I would like to know if there is a best practice minimum value for the oplog window?
I know a warning is triggered below 1 hour.
And I’m sure the value should different based on what everybody is trying to achieve, but I’m honestly clueless if a 24h window is too much or not.
In most cases, the default oplog size is sufficient. For example, if an oplog is 5% of free disk space and fills up in 24 hours of operations, then secondaries can stop copying entries from the oplog for up to 24 hours without becoming too stale to continue replicating. However, most replica sets have much lower operation volumes, and their oplogs can hold much higher numbers of operations.
Before mongod creates an oplog, you can specify its size with the oplogSizeMB option. Once you have started a replica set member for the first time, use the replSetResizeOplog administrative command to change the oplog size. replSetResizeOplog enables you to resize the oplog dynamically without restarting the mongod process.
New in version 4.4: Starting in MongoDB 4.4, you can specify the minimum number of hours to preserve an oplog entry. The mongod only truncates an oplog entry if:
The oplog has reached the maximum configured size, and
The oplog entry is older than the configured number of hours based on the host system clock.
By default MongoDB does not set a minimum oplog retention period and automatically truncates the oplog starting with the oldest entries to maintain the configured maximum oplog size.
Of course, the more oplog window you can have the better. However, anecdotally, I would say that a good value is perhaps something that can cover the weekend, in case something happened to your cluster during the weekend period. Realistically, it should be able to comfortably cover any planned maintenance window length you’re planning to have.
I would recommend you to kindly go through below mentioned documentation and thread to learn about Alert conditions, Common Triggers and Fix/Solution for Oplog Issues.
I have indeed read the documentation before posting.
I was just surprised to see that upon autoscaling to a M30 instance (because we had a surge of new users) the oplog window kept shrinking instead of autoscaling with the instance.
Before instance upgrade (and the wave of new users), we were on a the minimum instance (M10 I believe?) and we had around 3 days of oplog window.
Our M30 instance has currently 34Go disk free, but the oplog window doesn’t seem to be increasing. It’s like the oplog size doesn’t automatically change and is still at the same initial default value.
That’s why I wanted to set a value myself but I needed guidance on best practice.
Currently and according to real time metrics of Atlas, we have around 90Mo/h written, so that ~2Go/day.
Having 34Go of free disk, it’s not an issue at all.
However, if I specify 2 days of oplog window, that means 4Go of available logs, but what happens if the disk has less than 4Go? Do I receive a warning? Does the server crash or maybe oplog will automatically shrink to the remaining disk space?
Replication Oplog alerts can be triggered when the amount of oplog data generated on a primary cluster member is larger than the cluster’s configured oplog size. You can configure the following alert conditions in the project-level alert settings page to trigger alerts.
Please go through below link, as it answers your specific questions and also provide solutions for such issues.