How to move big data to a new collection without down time of servers?

Mian · February 22, 2023, 4:22pm

Hello everyone,

we have a nodejs server which is connected to a database of mongoDB.
There is a collection A which contains ca. 400k documents. Each document has a data field x, which is a string about 7MB.
For some reasons, we’d like to move the big data field x from collection A to a new collection B.
The nodejs server should be adjusted so that the collection B can be used for read and write of the big data, not the collection A anymore(but the collection A is still be used for some other purposes).
How can we arrange this, so that there will be no down time of the nodejs server and will be no loss of data during the transfer of the big data from A to B?

Thanks!

Kobe_W · February 22, 2023, 5:31pm

From one time point on, you double write the big string to both collections (old and new). So that the data set for migration is now fixed. (only existing ones need to be moved).

Then you can run a background process to move all strings from old col to new col (provided they don’t exist in new col yet).

Once this is done, you can switch reading from old to new.

You will need a sorting (by create time etc), so that the migration process can eventually terminate.

michael_hoeller · February 23, 2023, 5:13pm

Hello @Mian

kind off second @Kobe_W. You want to do a migrate on read. You can use the document versioning pattern to distinguish if the data was already migrated (assuming that you application mainly acts on collection A).

Another approach could be:
Read collection A. In case of a successful read: write a copy to collection B and remove the big data field in collection A. In case of a “not found” proceed to collection B, the document is already migrated.
This variant is only meant for a short migration phase. After deploying the change you should support the migration via a scripted solution, which you can run in an off peak time. Please keep in mind that the “not found” scenario will increase with the progress of the migration, and with this no needed read operations. So it is recommended to update your application asap after the migration to only act on collection B for the migrated field.
I also recommend to use transactions to make sure to have a proper rollback in case of an error.

Regards,
Michael

Mian · February 23, 2023, 6:30pm

Thanks @Kobe_W and @michael_hoeller for your time and reply!

system · February 28, 2023, 6:30pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.