Best Practices for Schema Management, Migrations, and Scaling in MongoDB

fa5am33r · December 6, 2024, 8:37pm

Hi MongoDB Community,

I’m currently working on a JavaScript-based project using MongoDB as our database, and I have a few concerns regarding schema management, migrations, and cluster operations. I’d really appreciate any insights or recommendations on the following topics:

Maintaining Schema Change History
How can I effectively maintain the history of schema changes for my collections in my code repository? For example, in relational databases, we use tools like Liquibase or Flyway to manage schema versioning. Is there an equivalent tool or best practice for MongoDB that integrates well with version control (Git)?
Tools for Schema Change Management
Are there specific tools available that can help track schema changes or migrations for MongoDB collections? I’ve come across options like migrate-mongo and mongobee, but I’d love to know if there are better solutions or if these are widely recommended.
Managing Property Changes
If I need to change a field’s structure, such as converting address (an object) to addresses (an array of objects), what’s the best approach?

Should I run a migration to transform the existing documents and then start reading from the new property immediately?
This feels risky, especially for large collections. Running migrations during these scenarios might be impractical, and I’d also need to run another migration to remove the old field afterward. Is there a more efficient or safer way to handle such updates?

M20 Cluster Deadlock Issue During Index Creation
Recently, I ran into a serious issue when adding an index to a collection. The cluster experienced a sudden spike in CPU usage, triggering scaling. However, the scaling process got stuck, and the Atlas portal became unresponsive for a while. This led to connection timeouts in my application, causing downtime in the production system.

At the time, the collection had only around 20,000 documents, which makes me worry about how such operations would behave as the collection grows further.
Is there a recommended way to handle such scenarios?
Are there specific best practices for adding indexes to large collections in a production cluster to minimize downtime?
Could I have done something differently to prevent the deadlock and scaling delay?

I’m looking forward to hearing your suggestions and experiences. Thanks in advance for your help!

Jack_Woehr · January 3, 2025, 2:47am

I always do a complete validation script in Node.js and check it into git.

Can you bring down the cluster (lock out other users) while you migrate?

I would be inclined to create a new field with a new name. Once sure everything is working, delete the old field. Again, you probably need to lock other users out while making changes.

If your problem is that there is never any downtime, that you have to do always do this on a live system, that’s different. It’s hard to envision a scenario where you do not have to at least lock the field you are going to modify to avoid losing new live data.

fa5am33r · January 3, 2025, 2:23pm

Thanks for suggesting schema validation—I’ll definitely look into it!

Regarding the property change, I practised how you were suggesting, creating a new field and updating the application logic to handle both the old and new properties. After running the migration script to transfer data from the old property to the new one, I deleted the old field. While this approach works it’s not always efficient, especially when working within larger teams.

The main challenge is that we need to rely on someone with production database access to run these migration scripts—both for migrating the data and deleting the old fields. This reliance can create bottlenecks and slow down development cycles.

I found this article, on how such MongoDB migrations are handled.

This makes me wonder: is there a need for a dedicated migration tool within MongoDB’s ecosystem? Or does something like this already exist? or if we need to build something custom like what is suggested in that article.

Jack_Woehr · January 3, 2025, 3:27pm

My feeling is that many new tools are needed in the MongoDB ecosystem. If you can derive a generalized design and implement it as open source, I’m sure you’ll be offered Creator or Champion status

Julia_Oppenheim · January 3, 2025, 5:13pm

@fa5am33r you mentioned you use Liquibase and Flyway for relational databases. Both support MongoDB–have you tried them out for MongoDB? I’d be curious to learn more about your experiences.

We’re also always open to new ideas to better support our users. I’ll DM you my scheduling link in case you’d like to chat about how we can better meet your data modeling needs for MongoDB.

fa5am33r · January 3, 2025, 5:46pm

correct me if i’m wrong. but i believe they don’t support Nodejs.

Julia_Oppenheim · January 3, 2025, 8:14pm

It depends on what type of support you’re looking for. Liquibase has an integration with NodeJS, although it’s difficult to tell if any restrictions exist if you’re using MongoDB.