MongoDB very slow at startup after upgrade

Danka_Ivanovic · April 7, 2020, 10:29pm

Hi, I have upgraded the MongoDB replica set, which consists of 3 members, from 4.0.11 to 4.2.5. After upgrading, startup lasts about 5 minutes. Before upgrading it was instant. It is related to oplog size, because I tested with dropping oplog on new mongo 4.2 and startup was instant. Max oplog size was 25GB, I decreased it to 5GB and the startup is still slow. Mongo db is on AWS with EBS standard disks. However mongo worked well until this upgrade. Do you have any idea what can cause slow startup? Thanks.

Danka_Ivanovic · April 8, 2020, 8:43am

Here is the part of the log related to STORAGE enigne:

2020-04-07T16:36:19.391+0000 I  STORAGE  [initandlisten] Detected data files in /var/lib/mongodb created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2020-04-07T16:36:19.391+0000 I  STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=7279M,cache_overflow=(file_max=0M),session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress],
2020-04-07T16:36:19.931+0000 I  STORAGE  [initandlisten] WiredTiger message [1586277379:931910][29458:0x7f2d17521b40], txn-recover: Recovering log 2439 through 2440
2020-04-07T16:36:20.002+0000 I  STORAGE  [initandlisten] WiredTiger message [1586277380:2298][29458:0x7f2d17521b40], txn-recover: Recovering log 2440 through 2440
2020-04-07T16:36:20.086+0000 I  STORAGE  [initandlisten] WiredTiger message [1586277380:86290][29458:0x7f2d17521b40], txn-recover: Main recovery loop: starting at 2439/15997696 to 2440/256
2020-04-07T16:36:20.087+0000 I  STORAGE  [initandlisten] WiredTiger message [1586277380:87145][29458:0x7f2d17521b40], txn-recover: Recovering log 2439 through 2440
2020-04-07T16:36:20.140+0000 I  STORAGE  [initandlisten] WiredTiger message [1586277380:139996][29458:0x7f2d17521b40], txn-recover: Recovering log 2440 through 2440
2020-04-07T16:36:20.189+0000 I  STORAGE  [initandlisten] WiredTiger message [1586277380:189056][29458:0x7f2d17521b40], txn-recover: Set global recovery timestamp: (1586277366, 1)
2020-04-07T16:36:20.197+0000 I  RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(1586277366, 1)
2020-04-07T16:36:20.205+0000 I  STORAGE  [initandlisten] Starting OplogTruncaterThread local.oplog.rs
2020-04-07T16:36:20.205+0000 I  STORAGE  [initandlisten] The size storer reports that the oplog contains 260431 records totaling to 5145508040 bytes
2020-04-07T16:36:20.205+0000 I  STORAGE  [initandlisten] Sampling the oplog to determine where to place markers for truncation
2020-04-07T16:36:20.207+0000 I  STORAGE  [initandlisten] Sampling from the oplog between Apr  6 14:15:08:222 and Apr  7 16:36:06:1 to determine where to place markers for truncation
2020-04-07T16:36:20.207+0000 I  STORAGE  [initandlisten] Taking 981 samples and assuming that each section of oplog contains approximately 2654 records totaling to 52436838 bytes
2020-04-07T16:38:48.156+0000 I  STORAGE  [initandlisten] Placing a marker at optime Apr  6 14:15:16:84
…..
2020-04-07T16:38:48.156+0000 I  STORAGE  [initandlisten] Placing a marker at optime Apr  7 16:29:31:13
2020-04-07T16:38:48.156+0000 I  STORAGE  [initandlisten] WiredTiger record store oplog processing took 147951ms
2020-04-07T16:38:48.160+0000 I  STORAGE  [initandlisten] Timestamp monitor starting

The problem is when WiredTiger starts to count samples from oplog. I decreased oplog size to 5GB, and still has a lag when starting, more than 2 minutes. It was above 5 minutes when max oplog size was 25GB.
Before upgrading to 4.2, oplog size was also 25GB, but startup was instant.

Danka_Ivanovic · April 8, 2020, 7:28pm

I tried with changing following 3 WiredTiger default eviction parameters:
storage.wiredTiger.engineConfig.configString:
“eviction_dirty_target=60,
eviction_dirty_trigger=80,eviction=(threads_min=4,threads_max=4)”

Now mongo is starting immediately.
Is it safe to set eviction_dirty_target and eviction_dirty_trigger values like this? Default is : eviction_dirty_target (default 5%) and eviction_dirty_trigger (default 20%). Thanks.

timw · June 6, 2020, 1:30pm

I have very slow startups too, but I count mine in hours, not minutes.

This is because I have thousands of collections and mongod needs to open about 130k file pointers when it starts up. In reality this settles down to about 5k-10k when the database is running under normal usage, but I desperately need to reduce these insane startup times.

I don’t see the configString option here: https://docs.mongodb.com/manual/reference/configuration-options/#storage-wiredtiger-options

Is it undocumented, or has it been removed?

kevinadi · June 9, 2020, 12:38am

Hi timw,

What is your MongoDB version? This situation should be a little better in MongoDB >=4.0.14 and >=4.2.1 due to improvements in SERVER-25025.

If your MongoDB version is above the aforementioned versions, there are further planned improvements in startup time in SERVER-43664. Please comment/upvote on the ticket if startup time is still an issue for you.

Best regards,
Kevin

timw · June 9, 2020, 10:50am

I resolved the worst of my issue by using a more powerful server.

The startup time is unworkable on a single core VPS. I was playing around with restoring backups, so skimped on the CPU. The solution for me (even for development and testing) is to use a bigger VPS.

As mentioned in my linked topic, even on my live servers (which are easily powerful enough to run the application) I get a 5 minute startup time. I will keep an eye on your planned improvements, thank you.

Stennie_X · April 9, 2021, 1:15am

A post was split to a new topic: Increased startup time after upgrading from 3.6.18 to 4.0.23