How to drop duplicates on a date field while aggregating and counting

Christina_Haig · May 2, 2025, 9:25pm

I’m looking for help constructing a mongo query that has stymied my team. Here is a sample data set:
_id: 67f9363b9588451eebf57818
created_at: 2025-04-11T15:33:15.893+00:00
image_id: “1311961”
last_scan_date: 2025-04-11T14:58:06.000+00:00
name: “boringssl”

Because of some legacy data reasons, the same component name with the same image_id can be created over and over again in this collection, each with a new ‘last_scan_date’. I would like to eliminate the duplication of last_scan_date in my query. Unfortunately the last_scan_date is not an accumulator object (so mongo complains), so we can’t group by it.

I would like to count the number of image_ids by unique ‘name’ field. In other words, how many unique image_ids per unique name, ignoring the duplication of name/image_id combos across multiple scans.

Here are a couple of queries my team has tried, but none quite seem right:

{
$group: {
_id: ‘$_id’,
imageID: {$first: ‘$image_id’},
lastScan: {$max: ‘$last_scan_date’},
comp_name: { $addToSet: {name: ‘$name’} }
}
},
{$group: {_id: ‘$comp_name’, count: {$sum:1}}},
{$sort: {count: -1}},
{
$project: {cname: ‘$_id.name’,
count: ‘$count’, _id:0}
}

{$match: {name: “asm”}},
{$group: {_id: ‘$image_id’,
maxdt: {$max: {‘lastdt’: ‘$last_scan_date’,
‘_id’: ‘$_id’},
},
// count: {$sum:1}
}
},
{$addFields: {img_id: ‘$_id’}},
{$project: {_id:0}},
{$project: {_id: ‘$maxdt._id’,
img_id: 1,
lastScan: ‘$maxdt.lastdt’
}
},
{$group: {_id: null, count: {$sum:1} } },
{$sort: {count: -1}}

steevej · May 3, 2025, 12:15pm

You sample data and code are unusable by us.

Please read Formatting code and log snippets in posts and format accordingly.

You may also consider publishing using https://mongoplayground.net/.

Christina_Haig · May 5, 2025, 8:38pm

One of my team members helped me solve the problem. If anyone else comes across this post, this may be helpful:

[
  { '$group': { _id: { image_id: '$image_id', name: '$name' } } },
  { '$group': { _id: '$_id.name', count: { '$sum': 1 } } }
]

system · May 10, 2025, 8:38pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.