I would guess that deleteMany is much faster.
Both scenarios assume the deletion of the 10 million documents that matches the delete criteria. This is to simplify the understanding.
Scenario 1 - stream
- you send one query to retrieve all docs
- the server fetches 100M docs from permanent storage
- the server sends 100M docs over to client
- the client sends 10M delete commands to the server
- the server performs 10M delete
- you could improve this by streaming only _id and guildID
Scenario 2 - deleteMany
- you send one command to deleteMany with an array of 200k elements (ouch)
- the server also fetches the 100M docs but mark them as deleted right away
- the main problem is that I do not think the index on guildID, if any, can be used because the query is $nin
That’s a very interesting problem. Please do follow up on your findings.
My question is what is the source of the array of 200k elements? Is it another collection in the same database?
I would try to find a way to use $in rather than $nin by getting the guildID that needs to be deleted rather than using the guildID that needs to be kept. Specially if an index on guildID exists.