Delete expired deals except latest 5

Hey guys! :slight_smile:

Is it possible to create a query that deletes all of the expired deals from the deals collection except from the latest 5 from each store?

This examples has 24 deals split between 2 stores. Basically it should only care about the deals that has any value in the dateTo field and make sure it’s expired (the date have passed already. And then also i want to keep the latest 5 expired deals from each store. Delete the rest.

  [
    {dateTo: "2023-01-09", store: ObjectId("100000000000000000000000")},
    {dateTo: "2023-01-10", store: ObjectId("100000000000000000000000")},
    {dateTo: null, store: ObjectId("100000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("100000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("100000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("100000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("100000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("100000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("100000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("100000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("100000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("100000000000000000000000")},
    {dateTo: "2023-01-09", store: ObjectId("200000000000000000000000")},
    {dateTo: "2023-01-10", store: ObjectId("200000000000000000000000")},
    {dateTo: null, store: ObjectId("200000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("200000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("200000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("200000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("200000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("200000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("200000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("200000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("200000000000000000000000")},
    {dateTo: "2023-01-07", store: ObjectId("200000000000000000000000")}
  ]

So this example it should delete 4x deals from each of the 2 stores.

I am not sure if it can be done with a single access to the database.

It is always best to do thing with a single access but sometimes we cannot or we do not know.

My approach in a 2 steps process would be:

1 - mark the deals to delete

2 - delete the marked deals

To do 1, I would use an aggregation with a self $lookup with a final $merge to mark the deals.

unique_stores = { "$group" : {  _id : "$store" } }

lookup_expired_deals = { "$lookup" : {
    "from" : "deals" ,
    "localField" : "_id" ,
    "foreignField" : "store" ,
    "as" : "_expired_deals"  ,
    "pipeline" : [
        { "$sort" : { "dateTo" : 1 } } ,
        { "$match" : { "dateTo" : { "$lt" : "2023-01-09" } , "_to_delete" : false } } ,
        { "$project" : { "_id" : 1 } }
    ]
} }

match_stores_with_more_than_five_deals = { "$match" : {
    "_expired_deals.5" : { "$exists" : true }
} }

set_deals_to_delete = { "$set" : {
    "_deals_to_delete" : { "$slice" : [
        "$_expired_deals" , 0 , { "$subtract" : [ { "$size" : "$_expired_deals" } , 5 ] }
    ] }
} }

unwind_deals_to_delete = { "$unwind" : "$_deals_to_delete" }

project_deals_to_delete = { "$project" : { "_id" : "$_deals_to_delete._id" } }

mark_to_delete = { "$set" : { "_to_delete" : true } }

update_collection = { "$merge" : {
    "into" : "deals" ,
    "on" : "_id"
} }

/* The pipeline will add the field _to_delete:true to all the deals that are expired except the last
    5 of each store */

prepare_deletion_pipeline = [
    unique_stores ,
    lookup_expired_deals ,
    match_stores_with_more_than_five_deals ,
    set_deals_to_delete ,
    unwind_deals_to_delete ,
    project_deals_to_delete ,
    mark_to_delete ,
    update_collection
]

As an exercise to the read, I left out the final deleteMany() that uses the field _to_delete:true to determine which documents to delete.

That pipeline stage won’t match anything since nothing has _to_delete set at the beginning. Did you perhaps mean "_to_delete": {$ne: true} here? I’m not sure that this flag is necessary, as if there are two jobs running, it’s safe for them to try to delete same old deals I think.

Asya

1 Like

In fact, it is

"_to_delete":{"$exists":false}

It is there to prevent this infinite update situation.

$exists: false makes a lot more sense than "_to_delete" : false but since you are updating existing records and not adding new documents to the database, I’m not sure you really need to worry about that… Note that only aggregations with no blocking stages are subject to such infinite update - in your case, starting with $group means all the documents will get processes before we get to the $merge stage.

2 Likes

Thank you guys, however this is all 1000x over my head. I´m a complete beginner with mongodb and only need this query to run once. Would it be possible to provide me a full code i can run with the delete stuff included? :heart: :innocent:

What was left is simply:

db.getCollection( "deals" ).deleteMany( { "_to_delete" : true } )

I couldn’t get your query to work at all sadly (probably my fault). It just said acknowledged = true and deleted deals 0 or something.

I did however manage to create my own little script for it that worked perfectly as I wanted it to. It’s probably extremely ugly and slow but it did the trick hehe. And yeah was only gonna use it once either way. Here it is if someone else wants it:

const today = new Date().toISOString().slice(0, 10)

db.deals.find().forEach(function(doc) {
db.deals.find({ store: ObjectId(doc.store.toString()),
      "dateTo": {
          "$lt": today
      }
  }
).sort({"_id": -1}).skip(10).forEach(function(doc) {
  db.deals.deleteOne({_id: doc._id})
})
})