Poor write perfomance with MongoDB 5.0.8 in a PSA (Primary-Secondary-Arbiter) setup

Franz_van_Betteraey · May 13, 2022, 11:45am

Hi MongoDBs,

I have some write performance struggle with MongoDB 5.0.8 in an PSA (Primary-Secondary-Arbiter) deployment when one data bearing member goes down.

I am aware of the “Mitigate Performance Issues with PSA Replica Set” page and the procedure to temporarily work around this issue.

However, in my opinion, the manual intervention described here should not be necessary during operation. So what can I do to ensure that the system continues to run efficiently even if a node fails? In other words, as in MongoDB 4.x with the option “enableMajorityReadConcern=false”.

As I understand the problem has something to do with the defaultRWConcern. When configuring a PSA Replica Set in MongoDB you are forced to set the DefaultRWConcern. Otherwise the following message will appear when rs.addArb is called:

MongoServerError: Reconfig attempted to install a config that would change the implicit default write concern. Use the setDefaultRWConcern command to set a cluster-wide write concern and try the reconfig again.

So I did

db.adminCommand({
    "setDefaultRWConcern": 1,
    "defaultWriteConcern": {
        "w": 1
    },
    "defaultReadConcern": {
        "level": "local"
    }
})

I would expect that this configuration causes no lag when reading/writing to a PSA System with only one data bearing node available.

But I observe “slow query” messages in the mongod log like this one:

{
    "t": {
        "$date": "2022-05-13T10:21:41.297+02:00"
    },
    "s": "I",
    "c": "COMMAND",
    "id": 51803,
    "ctx": "conn149",
    "msg": "Slow query",
    "attr": {
        "type": "command",
        "ns": "<db>.<col>",
        "command": {
            "insert": "<col>",
            "ordered": true,
            "txnNumber": 4889253,
            "$db": "<db>",
            "$clusterTime": {
                "clusterTime": {
                    "$timestamp": {
                        "t": 1652430100,
                        "i": 86
                    }
                },
                "signature": {
                    "hash": {
                        "$binary": {
                            "base64": "bEs41U6TJk/EDoSQwfzzerjx2E0=",
                            "subType": "0"
                        }
                    },
                    "keyId": 7096095617276968965
                }
            },
            "lsid": {
                "id": {
                    "$uuid": "25659dc5-a50a-4f9d-a197-73b3c9e6e556"
                }
            }
        },
        "ninserted": 1,
        "keysInserted": 3,
        "numYields": 0,
        "reslen": 230,
        "locks": {
            "ParallelBatchWriterMode": {
                "acquireCount": {
                    "r": 2
                }
            },
            "ReplicationStateTransition": {
                "acquireCount": {
                    "w": 3
                }
            },
            "Global": {
                "acquireCount": {
                    "w": 2
                }
            },
            "Database": {
                "acquireCount": {
                    "w": 2
                }
            },
            "Collection": {
                "acquireCount": {
                    "w": 2
                }
            },
            "Mutex": {
                "acquireCount": {
                    "r": 2
                }
            }
        },
        "flowControl": {
            "acquireCount": 1,
            "acquireWaitCount": 1,
            "timeAcquiringMicros": 982988
        },
        "readConcern": {
            "level": "local",
            "provenance": "implicitDefault"
        },
        "writeConcern": {
            "w": 1,
            "wtimeout": 0,
            "provenance": "customDefault"
        },
        "storage": {},
        "remote": "10.10.7.12:34258",
        "protocol": "op_msg",
        "durationMillis": 983
    }

The collection involved here is under proper load with about 1000 reads and 1000 writes per second from different (concurrent) clients.

MongoDB 4.x with “enableMajorityReadConcern=false” performed “normal” here and I have not noticed any loss of performance in my application. MongoDB 5.x doesn’t manage that and in my application data is piling up that I can’t get written away in a performant way.

So my question is, if I can get the MongoDB 4.x behaviour back in 5.x. A write guarantee from the single data bearing node which is available in the failure scenario would be OK for me. But in a failure scenario, having to manually reconfigure the faulty node should actually be avoided.

Thanks for any advice!

kevinadi · May 18, 2022, 7:42am

Hi @Franz_van_Betteraey welcome to the community!

Arbiters are useful to allow a replica set to have a primary while in a degraded state (i.e.: when one secondary is down), however they come at the expense of data integrity and more complex operation & maintenance.

The safest setup for your data is to have a minimum of 3 members replica set with no arbiters, and use majority write concern. This way, your writes will propagate to the majority of nodes, ensuring that your data is safe once written. If you have a PSA setup, it is possible to have acknowledged writes to be rolled back. As an added bonus, majority write concern will also ensure that your app cannot feed more data that can be handled by the replica set safely, that is, it can act as a backpressure to ensure you don’t inadvertently overload your database.

Notably, the default write concern is now “majority” since MongoDB 5.0.

MongoDB 4.x with “enableMajorityReadConcern=false” performed “normal” here

There are major changes in WiredTiger between MongoDB 4.4 series and 5.0 so they behave slightly differently under a degraded situation (such as when a secondary is down in a PSA set). However the changes are done to ensure better data integrity.

Otherwise the following message will appear when rs.addArb is called:

I believe this can be supressed by initializing the replica set with a configuration document instead of using rs.add() and rs.addArb(), and I think this also sets up a different write concern default since the default implicit write concern changes depending on the presence of arbiters. See Implicit Default Write Concern. If you have a PSA setup, the implicit write concern should defaults to w:1, and I think this should be about comparable to the older 4.4 setup you refer.

Having said that, I would encourage you to explore a PSS setup instead of a PSA setup.

Best regards
Kevin

Franz_van_Betteraey · May 18, 2022, 3:40pm

Hi @kevinadi,

thank you very much for reaching out. The recommendation to use a PSS structure is certainly correct, but things are what they are. It has also worked well for us so far and it has at least protected us from the failure of one data-bearing node. I can’t change anything about this architecture at the moment.

As you see in the “slow query” messagens we also configured the system to use the default writeConcer w:1 which then allowed us to add the arbiter. Thus no problem here anymore:

Franz_van_Betteraey:

        "readConcern": {
            "level": "local",
            "provenance": "implicitDefault"
        },
        "writeConcern": {
            "w": 1,
            "wtimeout": 0,
            "provenance": "customDefault"
        },

However, we still observe a performance that is 10 times worse in the degraded state than before (related to the insert counts per second. That was not the fact with the 4.x version. I wonder if I should adress this as an issue? But I almost can’t believe that the system behaviour has changed so much and I’m still looking for a cause that I can fix on my side.

Best regards,
Franz

kevinadi · May 20, 2022, 4:26am

Hi @Franz_van_Betteraey

Yes unfortunately this is a side effect of architectural changes in WiredTiger. As of MongoDB 5.0 going forward, enableMajorityReadConcern is not available as an option anymore, and thus it is always on. Setting w:1 as default does not achieve the same effect as disabling majority read in pre-5.0, as the majority commit point of a PSA replica set will still fall behind as long as the replica set is in a degraded state.

There are various technical reasons why this change is necessary, but the main benefits are:

During periods of extreme stress, the performance degradation is more graceful in 5.0 compared to earlier versions, where it is possible that the degraded set work like nothing happened, but then performance just suddenly fall off a cliff.
The majority commit point requires the majority of voting nodes to advance, and the oplog can now grow so this majority commit point doesn’t get deleted. This greatly lessens the risk of a secondary falling off the oplog. In a PSA setup, since you need 2 data bearing members to form a majority and advance the majority commit point, the S will never fall off the oplog (subject to disk space availability in the functioning primary). It will be able to catch up once it becomes online again.
Majority read concern enables the operation of advanced features such as changestreams, transactions, better data integrity (i.e. not losing majority-ack’d writes), better rollback behaviour, better degradation under extreme stress, and many more.

However it also comes with some drawbacks when a PSA set is in a degraded state for long periods. The set will still be functional, but please note that no one should be running any replica set in a degraded state for an extended period of time. Without the majority of data bearing node available and the majority commit point reflecting the latest state of your database, your data isn’t safe.

Best regards
Kevin

system · May 30, 2022, 12:28pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.