Aggregation pipeline - returning aggregated result and updating original pre-aggregated documents

didienfj · June 27, 2023, 2:56pm

I have two collections A and B. The data in B is aggregated from A and I want to have a field on A which stores which record in B the data was aggregated to e.g. an audit trail

# A
{
_id: ObjectId("123"),
timestamp: "2023-01-01T09:00:00.000000Z",
count: 5,
imported_to: ObjectId("789")
},
{
_id: ObjectId("456"),
timestamp: "2023-01-01T13:00:00.000000Z",
count: 5,
imported_to: ObjectId("789")
}

# B
{
_id: ObjectId("789")
date: "2023-01-01T00:00:00.000000Z",
count: 10
}

I want to:

Aggregate A into B
Update A with the _id of which document A has been aggregated in B

Is this possible in a single query? Or would I have to generate the aggregate to update B with, record the A _id’s then update by an array of those _id’s?

steevej · June 27, 2023, 4:31pm

I tried to play with your model and I get errors with your ObjectId’s:

[BSONError:](#) Argument passed in must be a string of 12 bytes or a string of 24 hex characters or an integer

Please update your sample data so that we can experiment with it directly.

didienfj · June 30, 2023, 8:03am

Apologies, here is the corrected sample data:

# A
{
_id: ObjectId("649e8ba5ae40de0c0f6c56b1"),
timestamp: "2023-01-01T09:00:00.000000Z",
count: 5,
imported_to: ObjectId("649e8c0aac10027a16b300a4")
},
{
_id: ObjectId("649e8bfffcfdd2a0980c8678"),
timestamp: "2023-01-01T13:00:00.000000Z",
count: 5,
imported_to: ObjectId("649e8c0aac10027a16b300a4")
}

# B
{
_id: ObjectId("649e8c0aac10027a16b300a4")
date: "2023-01-01T00:00:00.000000Z",
cumulative_count: 10
}

John_Sewell · June 30, 2023, 8:12am

Ignore what I had written…it would involve an array of lookups which would be nasty…see what others come up with.

didienfj · June 30, 2023, 8:36am

No probs @John_Sewell, thanks anwyay. For context there will be ~3k A documents being aggregated into a single B document. So I didn’t feel like an array of that size linking back from B to multiple As was the right way to go (amongst other considerations).

John_Sewell · June 30, 2023, 9:43am

You could do it in a two stage process:

In Coll A group data and write a grouping ID to the records
Group up data in A per the grouping ID and write to B with the Grouping field

I can’t imagine this is an unusual use-case but just something I’ve not needed to do before!

John_Sewell · June 30, 2023, 10:18am

This is the kind of thing I was thinking of:

db.getCollection("A").update(
{
    'groupingID':{$exists:false}
},
[    
    {
        $set:{
            groupingID:{$substrCP:['$timestamp', 0,10]}
        }
    }
],
{multi:true})

db.getCollection("A").aggregate([
{
    $match:{
        'groupingID':{$exists:true}
    }    
},
{
    $group:{
        _id:'$groupingID',
        total:{$sum:'$count'}
    }
},
{
    $merge:{
        into:'B',
    }    
}
])

Ending up with:

Obviously you end up with a non objectID as the ID in the second table in my example which may not be to your taste.

I’d be very interested if there is a more elegant solution to this.

didienfj · July 2, 2023, 7:40pm

Thanks John, I’ll give that one a think - the current structure uses objectIDs and that is my preference in general.

Btw - what app is shown in the screenshots?

John_Sewell · July 2, 2023, 8:09pm

Studio3T. There is a free edition but quite a lot of the good stuff is pay walled. Things like import and export as well as schema analysis etc.
I use the import and export constantly through my day.
The screenshots were the output window which can run in table, tree or json view.

Its not cheap, but its paid for by the company so excellent value for me!

didienfj · July 3, 2023, 7:49am

Thanks, sounds good - seems like Studio3T is the go to solution. I’m currently using MongoDB Compass which is a bit lacking in features!