PSA Architecture with enableMajorityReadConcern false- Secondary down

Hi All,

We are doing bulk inserts continuously. With PSA Architecture with enableMajorityReadConcern false parameter, Secondary goes down after an hour or so and is blacklisted . Please recommend to solve this issue

Regards,
Saurav

The first thing to do is to look at the logs.

Hi All,

Below are the logs and parameters and Kindly advise :

2020-01-29T17:21:47.124+0530 I REPL [replication-0] We are too stale to use 10.95.147.92:27017 as a sync source. Blacklisting this sync source because our last fetched timestamp: Timestamp(1580295549, 1) is before their earliest timestamp: Timestamp(1580297909, 30285) for 1min until: 2020-01-29T17:22:47.124+0530

replication:
replSetName: rs1
enableMajorityReadConcern: false

Regards,
Saurav

I think the log from blacklisted host will be more useful. Are your machine using the same NTP servers?

What write concern are you using for your bulk inserts? If you use w:1, write operations will only wait for acknowledgement from the primary. Your secondary will eventually become stale if continuous writes can be acknowledged faster on the primary than they can be applied via replication on the secondary.

If you instead use a write concern of w:2, write operations will wait for acknowledgement from your secondary so it should not become stale. Since you only have one secondary, you should also set a wtimeout value to avoid a w:2 write concern blocking indefinitely if your secondary is down.

If you want a more resilient configuration, upgrade your replica set to a Primary-Secondary-Secondary configuration and use a w:majority write concern.

Regards,
Stennie

4 Likes

I liked the post because we are learning so much by hanging around.

2 Likes

Hi Stennie,

I am using w:1 in PSA Architecture , here w=1 is recommended by MongoDB , still it is blacklisted after the secondary or primary is down for 10 minutes. Please advise

Regards,
Saurav

Per my earlier comment, w:1 only waits for acknowledgement from the primary so your secondary will eventually become stale if continuous writes are acknowledged faster on the primary than they can be applied via replication on the secondary.

You need to increase your write concern (and ideally upgrade to a Primary-Secondary-Secondary configuration) to avoid this issue.

If there are pauses in your write activity that might allow your secondary to catch up, you could also try to mitigate the issue by increasing your oplog sizes. However, increased write concern is a better fix to ensure writes don’t outpace replication.

Regards,
Stennie

2 Likes

Hello Stennie,
We’re facing a similar issue with PSS architecture that receives a heavy influx of writes into the primary. Increasing the oplog size seems to have fixed the issue for now. Changing to w:2 was not a viable option for us due to the latency. In fact we had to switch to w:0 (although it provides no guarantee of write acknowledgement by even the primary) since the insertion speeds were too slow. Are there any wt settings that can be adjusted to speed up the insertions?
The 3 servers host 3TB data, 256GB RAM, SSD, RAID5

RAID5 is not a great choice for a performant database(of any type) it just does not give the throughput, you’ll likely see this as IO wait. You’ll want to switch that up to RAID 10.

w:0 can be a scary place to be if you care about your data.

2 Likes

Thank you Chris. We’re in the works of getting a raid10 cluster set up. In the meantime are there any mongodb specific configs that can be altered to improve write throughput?
For instance, mysql acquires global mutex on it’s key cache so turning off query cache and key cache gave us a heavy performance boost for our write heavy apps. Does mongodb have any similar cache settings that can be altered?

Hi Stennie,

I increased the opsize to 200 GB .After 1 hour of Primary Shutdown.,I started Mongo again but this time it goes into recovering state. It goes into this state for 5-6 hours.
Please suggest how to recover fast

@saurav_gupta Increased oplog sizing will buffer more write operations for replication, but if your continuous write load is outpacing how quickly writes can be replicated and applied on your secondaries, this will not address the underlying problem. You should also review your deployment metrics and consider whether your replication throughput is being limited by resources such as network or I/O bandwidth.

If you want to ensure reliable replication to secondaries, you need to throttle writes to what can reasonably be replicated to your secondaries given the buffer provided by your oplog sizing.

If you can upgrade to MongoDB 4.2, there’s a new replication flow control feature which is enabled by default and provides administrative control over the maximum secondary lag before queueing writes on the primary. This would impose a similar effect to using a majority write concern but with a bit more tolerance on acceptable lag as well as server-side admin control. This feature has an associated group of flowControl metrics in serverStatus that provides more insight into the activity on the primary.

@Kanishka_Ramamoorthy Please create a separate thread if you’d like to discuss issues specific to your deployment and use case.

Changing write concern to w:0 will make a replication throughput problem even worse: you’ll be writing without any acknowledgement and sending requests at the primary as fast as possible. If you have continuous writes this will exacerbate any problems due to secondary lag (your application isn’t even waiting to confirm the primary accepted the write).

Regards,
Stennie

3 Likes

Thank you @Stennie_X I’m opening a new thread