Help Needed: Zero Downtime Migration of 3TB Self-Hosted MongoDB Between Cloud Providers
Hi MongoDB Community! ![]()
I need help with a critical migration and would appreciate your expertise and suggestions.
My Current Situation
- MongoDB: Self-hosted PSA (Primary-Secondary-Arbiter) setup
- Data Size: ~3TB
- Current Oplog: 1GB size, 65 minutes duration window
- Network: 10Gb/s interconnect link between cloud providers
- Migration: Need to move from one cloud provider to another
- Critical Requirement: Zero or minimal downtime (this is our core transactional database)
The Challenge
I need to migrate this MongoDB cluster between cloud providers but cannot afford downtime as our platform serves critical transactions 24/7.
My main concerns:
- 65-minute oplog window seems too short for 3TB initial sync
- Cross-cloud network latency and bandwidth limitations
- How to ensure zero data loss during cutover
- What’s the safest migration approach for this scenario
What I’m Considering
- Replica Set expansion - Add new nodes in target cloud, then gradually migrate
- mongosync - If it works with PSA architecture
- Backup/restore + oplog replay - More traditional approach
- Other approaches - Open to suggestions!
Questions for the Community
- Oplog sizing: What’s the recommended oplog size for cross-cloud 3TB migration? What are the potential side effects of increasing from 1GB to 20GB+ (storage, performance, memory usage)?
- Network advantage: With my 10Gb/s interconnect between clouds, what migration strategies become more viable?
- Has anyone successfully done similar zero-downtime migrations?
- What are the biggest pitfalls to avoid?
- Any specific tools or strategies you’d recommend?
- How do you handle the final cutover without downtime?
My Environment
- Self-hosted MongoDB (not Atlas)
- PSA architecture
- 3TB of active transactional data
- 10Gb/s dedicated interconnect between source and target clouds
- Need to maintain high availability throughout
I’d really appreciate any insights, experiences, or step-by-step guidance from the community. This is a critical migration for our business and I want to make sure I do it right.
Thanks in advance for any help! ![]()