Help Needed: Zero Downtime Migration of 3TB Self-Hosted MongoDB Between Cloud Providers

Help Needed: Zero Downtime Migration of 3TB Self-Hosted MongoDB Between Cloud Providers

Hi MongoDB Community! :wave:

I need help with a critical migration and would appreciate your expertise and suggestions.

My Current Situation

  • MongoDB: Self-hosted PSA (Primary-Secondary-Arbiter) setup
  • Data Size: ~3TB
  • Current Oplog: 1GB size, 65 minutes duration window
  • Network: 10Gb/s interconnect link between cloud providers
  • Migration: Need to move from one cloud provider to another
  • Critical Requirement: Zero or minimal downtime (this is our core transactional database)

The Challenge

I need to migrate this MongoDB cluster between cloud providers but cannot afford downtime as our platform serves critical transactions 24/7.

My main concerns:

  • 65-minute oplog window seems too short for 3TB initial sync
  • Cross-cloud network latency and bandwidth limitations
  • How to ensure zero data loss during cutover
  • What’s the safest migration approach for this scenario

What I’m Considering

  1. Replica Set expansion - Add new nodes in target cloud, then gradually migrate
  2. mongosync - If it works with PSA architecture
  3. Backup/restore + oplog replay - More traditional approach
  4. Other approaches - Open to suggestions!

Questions for the Community

  • Oplog sizing: What’s the recommended oplog size for cross-cloud 3TB migration? What are the potential side effects of increasing from 1GB to 20GB+ (storage, performance, memory usage)?
  • Network advantage: With my 10Gb/s interconnect between clouds, what migration strategies become more viable?
  • Has anyone successfully done similar zero-downtime migrations?
  • What are the biggest pitfalls to avoid?
  • Any specific tools or strategies you’d recommend?
  • How do you handle the final cutover without downtime?

My Environment

  • Self-hosted MongoDB (not Atlas)
  • PSA architecture
  • 3TB of active transactional data
  • 10Gb/s dedicated interconnect between source and target clouds
  • Need to maintain high availability throughout

I’d really appreciate any insights, experiences, or step-by-step guidance from the community. This is a critical migration for our business and I want to make sure I do it right.

Thanks in advance for any help! :pray:

If mongosync is available and works for your MongoDB version and setup, I’d just use that. Another great open-source option is GitHub - adiom-data/dsync: Database synchronization tool.

Whatever tool you choose, do a dry run and see how long the initial data copy takes. The will help you decide on the necessary oplog window size. For 3TB, I’d expect that you’ll need around 5-6 hours.

10Gb/s link should be fine. Typically it’s the destination cluster that’s the bottleneck as writes are expensive. If you can, best overprovision for the migration, and downsize later.

Another thing I’d consider is the impact on the source and the headroom that you have in terms of load. If your source cluster is pegged, you’d need to throttle the process considerably.

Write ops/sec on the source is a good metric to check to ensure that your migration process is able to catch up.

Lastly, mognosync and dsync and other tools that combine data sync + CDC help to bring downtime to a minumim, but there’s still a short period of time during which you stop the writes on the source, wait until the last writes makes it to the destination, (hopefully) run some quick data integrity checks, and then start writing to the new cluster. That’s the typical cutover procedure.
If you want true 0-downtime with no interruptions for upstream services, you’d need to put a lot more work on the application layer and implement dual writes and dual reads - similar to what is outlined here: Middleware-assisted Zero-downtime Live Database Migration to AWS | AWS Architecture Blog. It’s really easier said than done!

Good luck!