Aggregation pipeline - returning aggregated result and updating original pre-aggregated documents

I have two collections A and B. The data in B is aggregated from A and I want to have a field on A which stores which record in B the data was aggregated to e.g. an audit trail

# A
{
_id: ObjectId("123"),
timestamp: "2023-01-01T09:00:00.000000Z",
count: 5,
imported_to: ObjectId("789")
},
{
_id: ObjectId("456"),
timestamp: "2023-01-01T13:00:00.000000Z",
count: 5,
imported_to: ObjectId("789")
}

# B
{
_id: ObjectId("789")
date: "2023-01-01T00:00:00.000000Z",
count: 10
}

I want to:

  1. Aggregate A into B
  2. Update A with the _id of which document A has been aggregated in B

Is this possible in a single query? Or would I have to generate the aggregate to update B with, record the A _id’s then update by an array of those _id’s?

I tried to play with your model and I get errors with your ObjectId’s:

[BSONError:](#) Argument passed in must be a string of 12 bytes or a string of 24 hex characters or an integer

Please update your sample data so that we can experiment with it directly.

Apologies, here is the corrected sample data:

# A
{
_id: ObjectId("649e8ba5ae40de0c0f6c56b1"),
timestamp: "2023-01-01T09:00:00.000000Z",
count: 5,
imported_to: ObjectId("649e8c0aac10027a16b300a4")
},
{
_id: ObjectId("649e8bfffcfdd2a0980c8678"),
timestamp: "2023-01-01T13:00:00.000000Z",
count: 5,
imported_to: ObjectId("649e8c0aac10027a16b300a4")
}

# B
{
_id: ObjectId("649e8c0aac10027a16b300a4")
date: "2023-01-01T00:00:00.000000Z",
cumulative_count: 10
}

Ignore what I had written…it would involve an array of lookups which would be nasty…see what others come up with.

No probs @John_Sewell, thanks anwyay. For context there will be ~3k A documents being aggregated into a single B document. So I didn’t feel like an array of that size linking back from B to multiple As was the right way to go (amongst other considerations).

You could do it in a two stage process:

  • In Coll A group data and write a grouping ID to the records
  • Group up data in A per the grouping ID and write to B with the Grouping field

I can’t imagine this is an unusual use-case but just something I’ve not needed to do before!

This is the kind of thing I was thinking of:

db.getCollection("A").update(
{
    'groupingID':{$exists:false}
},
[    
    {
        $set:{
            groupingID:{$substrCP:['$timestamp', 0,10]}
        }
    }
],
{multi:true})

db.getCollection("A").aggregate([
{
    $match:{
        'groupingID':{$exists:true}
    }    
},
{
    $group:{
        _id:'$groupingID',
        total:{$sum:'$count'}
    }
},
{
    $merge:{
        into:'B',
    }    
}
])

Ending up with:
image

image

Obviously you end up with a non objectID as the ID in the second table in my example which may not be to your taste.

I’d be very interested if there is a more elegant solution to this.

Thanks John, I’ll give that one a think - the current structure uses objectIDs and that is my preference in general.

Btw - what app is shown in the screenshots?

Studio3T. There is a free edition but quite a lot of the good stuff is pay walled. Things like import and export as well as schema analysis etc.
I use the import and export constantly through my day.
The screenshots were the output window which can run in table, tree or json view.

Its not cheap, but its paid for by the company so excellent value for me!

Thanks, sounds good - seems like Studio3T is the go to solution. I’m currently using MongoDB Compass which is a bit lacking in features!