$merge aggregation return the saved 'object'

Brian_Marting · December 7, 2021, 1:27pm

Hello,

Our current usecase is that we store a lot of objects in Collection X, these objects contain not enough data in order to saved to the DB. If we eventually have enough data, we will take all objects from Collection X by their unique special identifier and merge it into Collection Y.
Since the $merge is the last aggregation operation in our aggregtion, we were wondering if this is able to return the newly created object as we cannot add another aggregation. So our current flow would be to then do another roundtrip to the DB in order to match and retrieve this object from Collection Y, but we don’t really want to do the roundtrip again if its all doable in 1 aggregation or ‘trip’

steevej · December 7, 2021, 2:15pm

I do not understand.

Where to you store the objects of Cx?

How can you $merge objects from Cx into Cy if Cx is not saved in the DB?

I don’t think you can get the result of $merge or $out. It might be a nice feature to have.

I am not clear about your use-case but I would be worry of a use-case that needs the output of a $merge or $out right away. Specially in your case since you already have Cx and Cx contains the documents inserted or the modifications made to Cy. If you want to do further processing, do not $merge into Cy, do a $lookup and perform the other processing before the final $merge.

Brian_Marting · December 7, 2021, 2:24pm

Seems like I missed a part there, we have receive data from somewhere, and we want to always save their data. The problem is, that they also can deliver half of the data in some cases. In that case, we do not want to lose that data and we save this to collection X or TestField (lets call it testField from now on).
As soon as we receive new data that has all required data, we will match this data to the data in collection TestField and merge the data (merging all objects that have the same unique identifier) into object Test.
After we merged the objects together, we will $merge the combined object to collection Test (Collection Y in previous post). Our code looks like this:

    public Mono<Test> someMethod(String param) {
        return reactiveMongoTemplate.aggregate(Aggregation.newAggregation(TestField.class,
                        Aggregation.match(Criteria.where("test").is(param)),
                        Aggregation.group("field")
                                .first("value").as("value")
                                .first("test").as("test"),
                        Aggregation.group()
                                .first("test").as("test")
                                .push(new BasicDBObject()
                                        .append("k", "$_id")
                                        .append("v", "$value")
                                )
                                .as("array"),
                        Aggregation.replaceRoot(
                                MergeObjects.merge(
                                        new BasicDBObject().append("_id", "$test"),
                                        ArrayToObject.arrayValueOfToObject("array")
                                )
                        ),
                        Aggregation.project(getFields()),
                        Aggregation.merge()
                                .into(MergeOperationTarget.collection("test"))
                                .on("id")
                                .build()
                ).withOptions(AggregationOptions.builder().allowDiskUse(true).build()), Test.class)
                .single();
    }

The issue that we are currently having is that we can get:

add skipResult which means that it will return nothing
if we don’t add skipResult it will currently return everything from collection Test to us, which is also not what we want to have as we only wish to have the merged obj returned

Brian_Hnat2 · January 17, 2022, 8:41pm

I just stumbled across the same behavior and wound up here seeking the same answer. Our use case was around event sourcing. We insert an event document, and then run an aggregation pipeline to recalculate the state of the entity, merging into a materialized view (like this - https://docs.mongodb.com/manual/core/materialized-views/). Our first time running our code, we found all the documents in the materialized view being returned (using the JVM drivers, 4.4).

Brian_Marting · February 2, 2022, 2:22pm

Thanks for the suggestion, I eventually just skipped the output, as it returned the entire collection (for our case, a huge amount of documents) and just added another find of the document afterwards. Be careful if you do not skip to check what its returning though