I’m looking for help constructing a mongo query that has stymied my team. Here is a sample data set:
_id: 67f9363b9588451eebf57818
created_at: 2025-04-11T15:33:15.893+00:00
image_id: “1311961”
last_scan_date: 2025-04-11T14:58:06.000+00:00
name: “boringssl”
Because of some legacy data reasons, the same component name with the same image_id can be created over and over again in this collection, each with a new ‘last_scan_date’. I would like to eliminate the duplication of last_scan_date in my query. Unfortunately the last_scan_date is not an accumulator object (so mongo complains), so we can’t group by it.
I would like to count the number of image_ids by unique ‘name’ field. In other words, how many unique image_ids per unique name, ignoring the duplication of name/image_id combos across multiple scans.
Here are a couple of queries my team has tried, but none quite seem right:
{
$group: {
_id: ‘$_id’,
imageID: {$first: ‘$image_id’},
lastScan: {$max: ‘$last_scan_date’},
comp_name: { $addToSet: {name: ‘$name’} }
}
},
{$group: {_id: ‘$comp_name’, count: {$sum:1}}},
{$sort: {count: -1}},
{
$project: {cname: ‘$_id.name’,
count: ‘$count’, _id:0}
}
{$match: {name: “asm”}},
{$group: {_id: ‘$image_id’,
maxdt: {$max: {‘lastdt’: ‘$last_scan_date’,
‘_id’: ‘$_id’},
},
// count: {$sum:1}
}
},
{$addFields: {img_id: ‘$_id’}},
{$project: {_id:0}},
{$project: {_id: ‘$maxdt._id’,
img_id: 1,
lastScan: ‘$maxdt.lastdt’
}
},
{$group: {_id: null, count: {$sum:1} } },
{$sort: {count: -1}}