Comparing target and source datasets to check for removed data in source

In the app I am working on, we are downloading a database from a partner’s API (documents with unique IDs), parsing it into JSON format, and storing the data in our MongoDB collection. Periodically, since updates may be made in the partner’s database, we download their database again to update our collection in MongoDB. Upserting the documents from their database to our collection will handle the cases where the data entry is the same as before (we just write over the existing document with the same ID) or a new one has been added on their side (we create a new document in our collection).
But what is the best practice to handle the case where they have removed data entries on their side? How does one check the ID’s of documents in the source MongoDB collection against the target database to see what is now missing/absent on the source side compared to what we have in our target Mongo collection, so that we know which ones to remove on our side as well?

1 Like

What I would try is:

  1. Add a timestamp that correspond to your migration date time to each source documents before the upsert.
  2. Delete all documents from the target dataset that do not have the timestamp added in 1.

If unsatisfactory I will experiment with using a temporary collection in the target server to hold the new source dataset and $merge. I am still unsure if that can work. May be something like a $merge from new temporary data to target that merge each document, and then another $merge from the target into the new temporary using discard. But not sure.

But the big question is why don’t you simply wipe out your old copy to replace it with the new copy.

1 Like