Improve time of dump/restore for big databases with small distribution

Context:

  • Big database more than 500GB
  • 2 collections take 50% each
  • Migration from MongoDB 3.2 → MongoDB 4.4

What is the best solution to dump/restore this database?

Some ways of improvement:

  • Improve restoration: Try to increase the number of workers by collection by this option --numInsertionWorkersPerCollection to mongodbrestore
  • Try to parallelize restore/dump by selecting specific collection with the option --collection to mongodbdump

Questions:

  • The ways of improvement above seem a good idea? There is some warning to take into consideration?
  • There is another solution to improve this process?

Hi @Vincent_Herlemont and welcome in the MongoDB Community :muscle: !

First, I don’t see how you are going to make this works. There is such a gap between 3.2 and 4.4 that there are probably a few differences in the 3.2 dump you will get and the 4.4 dump that you need to be able to restore on a 4.4 cluster.

The doc specifies that you can only mongorestore to the same major version of the MongoDB server that the files were created from.

So maybe it works by some miracle. But can you bet on it?

The doc also recommends to use the same version of mongodump & mongorestore for the 2 operations but 100.5.1 is only compatible with 4.0, 4.2, 4.4 and 5.0.

I do have a solution though but it does require to migrate to Atlas… which also has the advantage to guarantee that you will never have to do that ever again. :smiley:

The MongoDB Atlas Live Import supports a migration path from 3.2 to 4.0 and later so definitely 3.2 to 4.4 or even 5.0. This has the benefit to secure the migration.

Once you are in 4.4 or 5.0 in Atlas, nothing prevents you from taking the snapshot back home and restore on your home-made cluster. Or stay in Atlas if you prefer to stay there and avoid other troubles in the future with some other DIY or upgrade job.

Atlas provides all this kind of automation & monitoring so you don’t have to worry about it and focus on the real added value of your product.

Cheers,
Maxime.

1 Like

Many thanks for your answer @MaBeuLux88 .

Just for information, here command mongodump (V3.2) to create a dump from a MongoDB 3.2 instance and restore with mongorestore (V4.4) works without error to MongoDB 4.4 instance.

root@user:/# mongodump 
...
2022-04-01T11:49:05.967+0200	done dumping test (1109730 documents)
root@user:/# mongorestore \
	--objcheck \
	--stopOnError
	--stopOnError \
	--maintainInsertionOrder \
	--drop \
	--preserveUUID
....
2022-04-01T15:34:13.780+0000	3881921 document(s) restored successfully. 0 document(s) failed to restore.

What is your opinion on it?


Related to MongoDB Atlas, I am on an on-premise environment and I can not reach/use external cloud services. How I can deal for the best with these constraints?

Hey again,

Well it’s not something that MongoDB tests. Maybe it does work, maybe not perfectly. Maybe some complex field types might not survive the transfert.

Without Atlas the official solution would be to migrate 3.2 => 3.4 => 3.6 => 4.0 => 4.2 => 4.4 => 5.0 but as you can guess, it’s a bit long to do and each major release has it’s bundle of things that needs to be done between each migration. Like setFeatureCompatibilityVersion() for example.

You can find all these informations in the release notes for each version: https://www.mongodb.com/docs/manual/release-notes/5.0-upgrade-replica-set/.

With this solution, it would be possible to achieve a zero down time upgrade. If you can afford a downtime, then your solution is probably better but test it in and out to make sure there is no data loss.

Cheers,
Maxime.

@MaBeuLux88 Thanks for your answer.

If y read the documentation process for MongoDB E.g. Upgrade a Standalone to 3.4

There are 2 steps:
1 - Replace existing 3.2 binaries with the 3.4 binaries.
2 - Enable backwards-incompatible 3.4 features.

Questions:

  • The seconds’ step “Enable backwards-incompatible 3.4 features.” seems optional isn’t it?
  • How do I know that migration has been achieved?
  • Does scripting the migration procedure with MongoDB docker containers (one container per version) seem like a good idea to you?

No it’s not optional because it’s a prerequisistes for the upgrade to 3.6, etc. See the docs : https://www.mongodb.com/docs/manual/release-notes/3.6-upgrade-replica-set/#prerequisites

All the steps are important. They are here for a reason.

When you have your RS running with the new binaries and you are done following all the upgrade instructions. Also, make sure your RS is stable (rs.status()) before anything else.

No because during the rolling upgrade procedure, you need to check that the new node has rejoined successfully the RS and you need to make sure that it’s catching up with the primary and rejoining as a Secondary. You can only upgrade the next secondary node once the previous one is done upgrade and the RS is back in a stable state.