We have a very challenging use case where we need to delete expired documents from a collection.
The collection size is app. 500GB and it has more than 500M documents, eventually our goal is to delete about 60% of the documents.
following this support article: https://support.mongodb.com/article/000020114/recommendations-for-deleting-large-numbers-of-documents
We decided to go with the approach of copying the non-expired documents to a new collection and drop the old one.
I need help with understanding the solution for such problem. How can we efficiently copy such large data from one collection to another with minimum impact on the performance or the uptime of our mongodb cluster.
I don’t have access to that link, but I think I would craft a bunch of aggregations with $match on a range and $out to populate the new collection bits by bits.
You could cron this over a few days and this should work just fine I guess.
If your cluster is mostly idle at night, you could also mongodump that collection with a query filter to only take what you want one night. And mongorestore is the next night in the new collection?
Differences could be resolved with an oplog replay or some an aggregation with $merge.
Just a few ideas but definitely not a definitive answer.