Procedure to remove usePowerOf2Sizes in existing collection before moving to 4.2.x

Hi

We are trying to migrate from 4.0.27 to 4.2.20 version, using below path

4.0.27 migrating replicaset data from MMapV1 to WT, as below

  1. move secondary to WT storage engine and replicate the data from Primary (currently running with MMapV1)
  2. once all secondary in replicaset is migrated to WT storage engine and replication data sync is successful
    2.a) Primary is also migrated to WT and replication data sync is successful.

All fine till above steps.

Now we are trying to move from 4.0.27 to 4.2.x version, using below path

  1. Upgrade one of the secondary to 4.2.x version, but the secondary is not coming up with below error message,

    “[rsSync-0]Fatal assertion 34437 InvalidOptions: unknown option to collMod: usePowerOf2Sizes at src/mongo/db/repl/sync_tail.cpp 851”
    “[rsSync-0] \n\n***aborting after fassert() failure \n\n”

    from mongo document collMod — MongoDB Manual, it is obvious that, these options are removed from 4.2.

But couldn’t find any documentation reference, how to mitigate this issue before upgrade to 4.2.x?
How do we remove usePowerOf2Sizes option in the existing collection without removing and affecting existing data in the collections before upgrading to 4.2.x version?

Could you please provide any pointer to this?

Thanks,
Navanee

Hi @Navaneethakrishnan_91112 and welcome back :smiley: !

The MMAPv1 deprecated storage engine since 3.2 is completely removed in MongoDB 4.2.

Before you upgrade to 4.2, you need to upgrade your nodes to WiredTiger.

I would do the follow to upgrade. Supposing you have a 3 nodes RS in 4.0.X:

  • Remove one of the secondary node from the RS config.
  • Kill that one of the secondary node, wipe it clean and upgrade it to the latest OS, security patches, etc.
  • Install MongoDB 4.2.X
  • Add that node into the RS so it can perform an initial sync.
  • Now you should have one node running with WiredTiger 4.2.X and 2 nodes in 4.0.X.
  • You can repeat the operation or if you want to avoid the initial sync, you can choose to db.fsyncLock() the 4.2 node for a moment and take a copy of the DB folder, so the 2 other nodes can just catch up using the oplog.

Always make sure to have 2 nodes running so they can elect a primary (or a strict majority if you have more than 3 nodes).

Before you do all that though, make sure to read again the production notes about upgrade from version X to Y to make sure that you didn’t overlooked a step like remove pv0 for example or forget to set the feature compatibility version.

Just for the record, Power of 2 is a notion that doesn’t exist anymore in WiredTiger so a WT node shouldn’t mention that.

Cheers,
Maxime.

Hi Maxime,

Thank you very much for your reply. Please see below reply to some your reply

<<
The MMAPv1 deprecated storage engine since 3.2 is completely removed in MongoDB 4.2.

Yes. We are aware of that, that’s the reason before moving to 4.2, we migrated the storage engine from MMapV1 to WT in 4.0.27 itself.

<<
I would do the follow to upgrade. Supposing you have a 3 nodes RS in 4.0.X:

This is exactly we did. Protocol version already running in PV1 in 4.0 itself and appropriate FCV values are set before 4.2 upgrade.

Surprise here was that, some reason 4.2 upgraded node, complaining about Power of 2 in the initial sync and didn’t come up, as reported in the initial query.

Unfortunately, we couldn’t reproduce this issue in our setup again.

So my question again is, in 4.0 itself (Power of 2 option set in 4.0), is there any way to remove this Power of 2 option before upgrading to 4.2.x.
“options” : {
“flags” : 1
}, → this is power of 2 option set in the collection, how do we remove this from the collection, which is running in 4.0 with WT engine.

This is just to make sure, non-supported flags are cleaned up in 4.0 itself, before 4.2 upgrade.

Thanks,
Navanee

Where / how do you see this power of 2 flag / option ?

First saw power of 2 from 4.2 secondary log

<<
“[rsSync-0]Fatal assertion 34437 InvalidOptions: unknown option to collMod: usePowerOf2Sizes at src/mongo/db/repl/sync_tail.cpp 851”
“[rsSync-0] \n\n***aborting after fassert() failure \n\n”

Second I did a simple testing,
In my collection I did add the power of 2 using runCommand,

before adding power of 2 in 4.0 with WT, db.getCollectionInfos() returns empty options

“options” : {

},

After adding power of 2, db.getCollectionInfos() returns below

“options” : {
“flags” : 1
},

So I assume, flags:1 is power of 2.

FYI, when I add additional flag, flags is getting changed.

How did you do that? In 4.2 the option was removed.

Can you send the steps to reproduce the problem? I can’t reproduce on my end.

Sorry, if my reply was not clear earlier, I added this in 4.0-WT- Primary node with
4.2 WT, as secondary node.

Unfortunately, not able to repro it now, but give me sometime, I have identified a pattern, where in which, this could be repro-ed, will get back in with sample repro program.

1 Like

Following is the repro steps

Have replica-set, for testing purpose, currently having 3 member set

  • Setup in 4.0 - WT
  • 3 member replica set
    m1 - data path (db path) “m1_data”
    m2 - data path “m2_data”
    m3 - data path “m3_data”

Step 1)
Create a “testCollection”

Step 2)
db.runCommand( {collMod : “testCollection” , usePowerOf2Sizes : true } )
{
“usePowerOf2Sizes_old” : true,
“usePowerOf2Sizes_new” : true,
“ok” : 1,
“operationTime” : Timestamp(1656473051, 1),
“$clusterTime” : {
“clusterTime” : Timestamp(1656473051, 1),
“signature” : {
“hash” : BinData(0,“AAAAAAAAAAAAAAAAAAAAAAAAAAA=”),
“keyId” : NumberLong(0)
}
}
}

Check whether the powerOf2 is added
db.getCollectionInfos()
“name” : “testCollection”,
“type” : “collection”,
“options” : {
“flags” : 1
},

Step 3)
Simple document insertion in to the collection

Step 4)
Now upgrade m3 to 4.2, for simplicity purpose changed, m3 data path to “m3_new_data”

at this moment, following is the replica set

m1 - running in 4.0 as PRIMARY
m2 - running in 4.0 as SECONDARY
m3 - running in 4.2 as SECONDARY

Step 5)
Repeat Step 2) in m1

now the m3 running in 4.2 as secondary crashes.

2022-06-29T09:23:52.615+0530 E REPL [repl-writer-worker-0] Failed command { collMod: “testCollection”, usePowerOf2Sizes: false } on test with status InvalidOptions: unknown option to collMod: usePowerOf2Sizes during oplog application
2022-06-29T09:23:52.615+0530 F REPL [repl-writer-worker-0] Error applying operation ({ op: “c”, ns: “test.$cmd”, ui: UUID(“ed0b8b42-7fa5-49fa-8b67-2fc34c63cf65”), o: { collMod: “testCollection”, usePowerOf2Sizes: false }, o2: { collectionOptions_old: { uuid: UUID(“ed0b8b42-7fa5-49fa-8b67-2fc34c63cf65”), flags: 0 } }, ts: Timestamp(1656474832, 1), t: 4, h: 5926513413780485954, v: 2, wall: new Date(1656474832605) }): :: caused by :: InvalidOptions: unknown option to collMod: usePowerOf2Sizes
2022-06-29T09:23:52.618+0530 F REPL [rsSync-0] Failed to apply batch of operations. Number of operations in batch: 1. First operation: { op: “c”, ns: “test.$cmd”, ui: UUID(“ed0b8b42-7fa5-49fa-8b67-2fc34c63cf65”), o: { collMod: “testCollection”, usePowerOf2Sizes: false }, o2: { collectionOptions_old: { uuid: UUID(“ed0b8b42-7fa5-49fa-8b67-2fc34c63cf65”), flags: 0 } }, ts: Timestamp(1656474832, 1), t: 4, h: 5926513413780485954, v: 2, wall: new Date(1656474832605) }. Last operation: { op: “c”, ns: “test.$cmd”, ui: UUID(“ed0b8b42-7fa5-49fa-8b67-2fc34c63cf65”), o: { collMod: “testCollection”, usePowerOf2Sizes: false }, o2: { collectionOptions_old: { uuid: UUID(“ed0b8b42-7fa5-49fa-8b67-2fc34c63cf65”), flags: 0 } }, ts: Timestamp(1656474832, 1), t: 4, h: 5926513413780485954, v: 2, wall: new Date(1656474832605) }. Oplog application failed in writer thread 0: InvalidOptions: unknown option to collMod: usePowerOf2Sizes
2022-06-29T09:23:52.618+0530 F - [rsSync-0] Fatal assertion 34437 InvalidOptions: unknown option to collMod: usePowerOf2Sizes at src\mongo\db\repl\sync_tail.cpp 851
2022-06-29T09:23:52.624+0530 F - [rsSync-0] \n\n***aborting after fassert() failure\n\n

It is happening consistently.
It looks like a bug to me, what do you think?

NOTE: If we try the same step 2) in 4.2 member as PRIMARY, then it gracefully rejects it and doesn’t crash.
Couldn’t the same be the behavior in the replication scenario as well??

Thanks,
Navanee

Hi @Navaneethakrishnan_91112,

I tried to reproduce this error with a single node RS but I wasn’t able to get the error.

I did some digging though in the upgrade replica set to 4.2 docs and in the Preparedness section, you get a link to all the Compatibility Changes in MongoDB 4.2.

In this doc, the section “MMAPv1 Specific Options for Commands and Methods” mentions that the MMAPv1 usePowerOf2Sizes is removed. It’s also mentioned in the collMod doc in the “note” at the top.

During my testing I tried to migrate a single node RS from 4.0.28 to 4.2.21 twice. First time the testCollection had a flag usePowerOf2Sizes set to true or false.

Before each upgrade I have this:

test:PRIMARY> db.getCollectionInfos()
[
	{
		"name" : "testCollection",
		"type" : "collection",
		"options" : {
			"flags" : 1
		},
		"info" : {
			"readOnly" : false,
			"uuid" : UUID("8a6af3c2-c998-4d25-91bf-78eb91e0b021")
		},
		"idIndex" : {
			"v" : 2,
			"key" : {
				"_id" : 1
			},
			"name" : "_id_",
			"ns" : "test.testCollection"
		}
	}
]

And after I have:

test:PRIMARY> db.getCollectionInfos()
[
	{
		"name" : "testCollection",
		"type" : "collection",
		"options" : {
			
		},
		"info" : {
			"readOnly" : false,
			"uuid" : UUID("8a6af3c2-c998-4d25-91bf-78eb91e0b021")
		},
		"idIndex" : {
			"v" : 2,
			"key" : {
				"_id" : 1
			},
			"name" : "_id_",
			"ns" : "test.testCollection"
		}
	}
]

So as you see above, the flag was removed automatically for me during the upgrade and I don’t have the error you mentioned in the logs.

The difference between you and I could be that you are running 4.2 and 4.0 in the same RS at the same time. Question: Did you disable the option before the migration like it’s suggested in the doc or what is still on?

db.runCommand( {collMod : "testCollection" , usePowerOf2Sizes : false } )

I guess this should be enough to fix the problem during the migration process as the flag doesn’t exist in 4.2+ anyway.

Cheers,
Maxime.

2 Likes

Hi Maxime,

Thank you for the reply again. I think, you have missed a step in repro. Issue is not happening during 4.2 upgrade. But 4.2 node as secondary after upgrade.

As mentioned in Step 4) in the earlier reply, keep primary node in 4.0 and secondary node in 4.2.

Step 4)
Now upgrade m3 to 4.2, for simplicity purpose changed, m3 data path to “m3_new_data”

at this moment, following is the replica set

m1 - running in 4.0 as PRIMARY
m2 - running in 4.0 as SECONDARY
m3 - running in 4.2 as SECONDARY

Now you run the “db.runCommand( {collMod : “testCollection” , usePowerOf2Sizes : true} )” in 4.0 Primary, which has 4.2 as secondary.

You’ll see the crash for sure in 4.2 Secondary. Please let me know if still repro steps are not clear.

Thanks,
Navanee

Yes I understand exactly what you mean and it’s clearly stated in the doc that this isn’t supported because the option is removed.

Your 4.2 node is trying to replicate the operation that was done in 4.0 but can’t because it doesn’t exist anymore in 4.2. It makes sense.

When you are upgrading your cluster from 4.0 to 4.2, it’s just a transitional state to allow the upgrade. It’s not a stable position that you want to keep running for hours. The goal is to migrate all the machines to 4.2 as soon as possible in a safe manner.

As you are already in 4.0 with WiredTiger, this flag is completely useless (it’s a MMAPv1 flag). So just set it to false for all the collections in all the DBs and don’t touch it while migrating. The flags will disappear in your 4.2 RS and you won’t have the error.

As a general guideline, avoid running “admin” operations when your are in the middle of an upgrade process.

Cheers,
Maxime.

1 Like

Thank you very much for support on this till now.

<<
Your 4.2 node is trying to replicate the operation that was done in 4.0
but can’t because it doesn’t exist anymore in 4.2. It makes sense.

If you look at from my initial reply, we did look into all the possible document and proceed with the upgrade, based on the documentation.

<<
As you are already in 4.0 with WiredTiger, this flag is completely useless (it’s a MMAPv1 flag). So just set it to false for all the collections in all the DBs and don’t touch it while migrating. The flags will disappear in your 4.2 RS and you won’t have the error.

JFYI, even if this flag is set as “true” and we migrate, the flag will disappear, in the initial sync. The problem occurs, only if this option is added true/false during migration.

For our use case, the application part where in which “powerOf2size” is added during migration is removed from our code, and it solves the problem.

But in general,
4.2 running as Primary works perfectly, if the application tries to add unsupported mmapv1 specific option, it just ignores as stated in document.
"MongoDB ignores the MMAPv1 specific option async for fsync."

But the question remains same,
Shouldn’t the same behavior applies to mongo 4.2 running as secondary to mongo 4.0 Primary?

In a large deployment, for high availability purpose, there could be use cases, where in which, in a replica set, only part of replica set members upgraded in a day, rest will be upgraded in subsequent days.

Considering this kind of use case, Mongo 4.2 as secondary crashing over un-supported option, doesn’t look good to me.
Please check from this perspective. Either please do add a documentation section on this or try to keep 4.2 Primary node behavior here in the secondary as well i.e instead of CRASH, let Mongo 4.2 secondary node as well ignore that option.

FYI, just for comparison perspective, did download and check the behavior in Enterprise edition also, the behavior looks same there, i.e Mongo 4.2 secondary crashes for an unsupported option from 4.0 Primary.

Thanks,
Navanee

Hi @Navaneethakrishnan_91112,

Ok I totally get it now and I understand your point. I’m escalating this issue and I’ll circle back when I have some news.

I agree that the node shouldn’t crash in these circonstances and should also ignore the command.

Thanks a lot for all the explanations!
Maxime.

1 Like

Hi @Navaneethakrishnan_91112,

I got a feedback from the SERVER team. They were able to reproduce and confirm the problem but as it’s a deprecated command, the best course of action is to remove these commands from the code prior to the migration, as it’s already documented.

So I’m not sure yet if they are going to fix the problem or just add extra documentation in the migration doc 4.0 => 4.2 to make sure there is a proper warning, but at least the message has been delivered to the right people now and they are taking actions.

I’ll keep you updated here if I get more news.

Cheers,
Maxime.

2 Likes

Bug ticket has been open. You can track it here:

https://jira.mongodb.org/browse/SERVER-67924

1 Like

Thank you very much for your support!

Thanks,
Navanee

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.