Help Needed: Zero Downtime Migration of 3TB Self-Hosted MongoDB Between Cloud Providers

vitor_perez · June 18, 2025, 3:56pm

Help Needed: Zero Downtime Migration of 3TB Self-Hosted MongoDB Between Cloud Providers

Hi MongoDB Community!

I need help with a critical migration and would appreciate your expertise and suggestions.

My Current Situation

MongoDB: Self-hosted PSA (Primary-Secondary-Arbiter) setup
Data Size: ~3TB
Current Oplog: 1GB size, 65 minutes duration window
Network: 10Gb/s interconnect link between cloud providers
Migration: Need to move from one cloud provider to another
Critical Requirement: Zero or minimal downtime (this is our core transactional database)

The Challenge

I need to migrate this MongoDB cluster between cloud providers but cannot afford downtime as our platform serves critical transactions 24/7.

My main concerns:

65-minute oplog window seems too short for 3TB initial sync
Cross-cloud network latency and bandwidth limitations
How to ensure zero data loss during cutover
What’s the safest migration approach for this scenario

What I’m Considering

Replica Set expansion - Add new nodes in target cloud, then gradually migrate
mongosync - If it works with PSA architecture
Backup/restore + oplog replay - More traditional approach
Other approaches - Open to suggestions!

Questions for the Community

Oplog sizing: What’s the recommended oplog size for cross-cloud 3TB migration? What are the potential side effects of increasing from 1GB to 20GB+ (storage, performance, memory usage)?
Network advantage: With my 10Gb/s interconnect between clouds, what migration strategies become more viable?
Has anyone successfully done similar zero-downtime migrations?
What are the biggest pitfalls to avoid?
Any specific tools or strategies you’d recommend?
How do you handle the final cutover without downtime?

My Environment

Self-hosted MongoDB (not Atlas)
PSA architecture
3TB of active transactional data
10Gb/s dedicated interconnect between source and target clouds
Need to maintain high availability throughout

I’d really appreciate any insights, experiences, or step-by-step guidance from the community. This is a critical migration for our business and I want to make sure I do it right.

Thanks in advance for any help!

Alexander_Komyagin1 · July 15, 2025, 12:32am

If mongosync is available and works for your MongoDB version and setup, I’d just use that. Another great open-source option is GitHub - adiom-data/dsync: Database synchronization tool.

Whatever tool you choose, do a dry run and see how long the initial data copy takes. The will help you decide on the necessary oplog window size. For 3TB, I’d expect that you’ll need around 5-6 hours.

10Gb/s link should be fine. Typically it’s the destination cluster that’s the bottleneck as writes are expensive. If you can, best overprovision for the migration, and downsize later.

Another thing I’d consider is the impact on the source and the headroom that you have in terms of load. If your source cluster is pegged, you’d need to throttle the process considerably.

Write ops/sec on the source is a good metric to check to ensure that your migration process is able to catch up.

Lastly, mognosync and dsync and other tools that combine data sync + CDC help to bring downtime to a minumim, but there’s still a short period of time during which you stop the writes on the source, wait until the last writes makes it to the destination, (hopefully) run some quick data integrity checks, and then start writing to the new cluster. That’s the typical cutover procedure.
If you want true 0-downtime with no interruptions for upstream services, you’d need to put a lot more work on the application layer and implement dual writes and dual reads - similar to what is outlined here: Middleware-assisted Zero-downtime Live Database Migration to AWS | AWS Architecture Blog. It’s really easier said than done!

Good luck!