Best approach for migration data from one sharded collection to another sharded collection

Hi,

I have two sharded collections within the same database, say collection_old and collection_new. Both these collections contain the same shard key, and each collection contains ~20 million documents. Now I want to migrate all the documents from collection_old to collection_new. After successful migration, I want to delete the collection_old.

Since the collection size is somewhat huge, I am unsure whether the below command will cause some performance issues and, if the insertion fails for some documents, how to get the ids for those documents so that I can fix the errors and retry later.

db.collection_new.insert(db.collection_old.find(), {ordered: false})

So please let me know if there is any best approach for migrating documents from one shared collection to another within the same database.

Thanks in advance

Hi,

Any suggestions will help.

Thanks

Hello @Allwyn_Jesu ,

Could you please help me with below queries for better understanding of this migration?

  • Which MongoDB version?
  • Is collection_old still receiving inserts & updates?
  • Could you share the output of db.collection.stats().avgObjSize and db.collection.stats().size?
  • What sort of performance issues/other general issues are looking at?
  • How similar are collection_old & collection_new? Do they have the same shard key, indexes, document structure, etc?

I think, it will be better to check whether there are _id collisions between old & new and fix them beforehand, instead of trying to fix it after the fact? If they are very similar, and colliding _id can be avoided, perhaps a mongodump & mongorestore is the fastest way to achieve this, since you can specify the number of insertion worker.

Thanks,
Tarun

2 Likes