Un-unwind an aggregation

Hi all,
I have the aggregation below:

List< RunInfoUnwindClxNames_UnwindClxParameterList> runDataList = RunDataRepository.GetCollectionObject().Aggregate()
	.Match(filter)
	.Lookup(
		SpecSheetRepository.GetCollectionObject(),
		m => m.RunDefinition,
		c => c.RunDefinition,
		(RunFull m) => m.SpecSheet
		)
	.Lookup(
		IndexGroupRepository.GetCollectionObject(),
		m => m.RunDefinition,
		c => c.RunDefinition,
		(RunFull m) => m.IndexGroupList
		)
	.Unwind<RunFull, RunFullUnwindClxNames>(m => m.ClxNames)
	.Lookup(
		BikeClxRepository.GetCollectionObject(),
		m => m.ClxNames,
		c => c.Name,
		(RunFullUnwindClxNames m) => m.ClxParameterList
		)
	.Unwind<RunFullUnwindClxNames, RunInfoUnwindClxNames_UnwindClxParameterList>(m => m.ClxParameterList)
	.Lookup(
		BikeClxRepository.GetCollectionObject(),
		m => m.ClxParameterList.BaseClxName,
		c => c.Name,
		(RunInfoUnwindClxNames_UnwindClxParameterList m) => m.ClxParameterList.BaseParamaterValues
		)
	.ToList();

As you can see I have done 2 $unwind in order to be able to run the following $lookup.
At the end of the aggregation, when I have all the values I want, I would like to un-unwind: an example is below.

The agregation returns me the following list

{ f1: 5 , f2 : 4 , f3 : 10}
{ f1 : 5, f2 : 6 , f3 : 10}
{ f1 : 5, f2 : 9 , f3 : 10}

and I would like to get:

{ f1: 5 , f2 : [ 4 6 9 ] , f3 : 10}

Is this possible?

Regards,
Andrea

I’m having a bit of an issue reading your code without formatting but you don’t need to unwind to use $lookup. If you pass an array of values to $lookup (via localField) it will be the equivalent to running $lookup on each of the values in the array separately (except it’ll be faster). So it sounds like maybe you need only one $unwind and not two.

It would be easier if you provided a sample input document as well as the output you desire.

Asya
P.S. you might also want to tag your question with aggregation tag.

1 Like

Hi @Asya_Kamsky ,
thank you very much for your quick answer.
Thank you for the tip about the array; that is very helpful!
About some example data, I have prepared the ones below. The original data are very large in dimension, so I have simplified them for just this matter.
I have 2 collection, RunInfo and ClxParameters; example data are below.

  • RunInfo
{ year: 2020 , runID : "QT_OUT01" , clxNames : [ "file1.clx" "file2.clx" ] }
{ year: 2020 , runID : "QT_OUT02" , clxNames : [ "file1.clx" "file3.clx" ] }
{ year: 2021 , runID : "QT_OUT01" , clxNames : [ "file4.clx" "file5.clx" ] }
{ year: 2021 , runID : "QT_OUT02" , clxNames : [ "file1.clx" "file5.clx" ] }
  • ClxParameterList
{ clxName : "file1.clx" , clxBase : "fileB1.clx" , values: [ { param1Val = 5 } { param2Val = 15 } ] }
{ clxName : "file2.clx" , clxBase : null , values: [ { param5Val = 34 } { param6Val = 33 } ] }
{ clxName : "file3.clx" , clxBase : null , values: [ { param5Val = 36 } { param6Val = 100 } ] }
{ clxName : "file4.clx" , clxBase : "fileB2.clx" , values: [ { param1Val = 66 } { param2Val = 75 } ] }
{ clxName : "file5.clx" , clxBase : null , values: [ { param5Val = 3 } { param6Val = 67 } ] }
{ clxName : "fileB1.clx" , clxBase : null , values: [ { param3Val = 56 } { param4Val = 99 } ] }
{ clxName : "fileB2.clx" , clxBase : null , values:  [ { param3Val = 77 } { param4Val = 88} ] }

If I searched for “{ year: 2020 , runID : “QT_OUT01” }” I would like to get:

{ year: 2020 , runID : "QT_OUT01" , clxNames : [ "file1.clx" "file2.clx" ] ,
clxParameterFiles: [

{ clxName : "file1.clx" , clxBase : "fileB1.clx" , values: [ { param1Val = 5 } { param2Val = 15 } ] ,
	clxBaseParameters: { clxName : "fileB1.clx" , clxBase : null , values: [ { param3Val = 56 } { param4Val = 99 } ] }
}

{ clxName : "file2.clx" , clxBase : null , values: [ { param5Val = 34 } { param6Val = 33 } ] 
	clxBaseParameters: null
}

]
}

Basically I add one field “clxParameterFiles” that contains the matching document of ClxParameterList; on the top of this, at each ClxParameterList document I have added, I add a field “clxBaseParameters” that contain again the matching document based on clxBase.

clxBase can be null or have just 1 string value (no array of string).

Expected result:
Capture

I know that this complex but it is the best compromise that I have found between the current usage and a decent size of the database.

Regards,
Andrea

Ok so all you need to do is aggregation where first stage is {$match:{...}} to get the year and runID you want. The next stage would be {$lookup:{from:"ClxParameterList", localField:"clxNames", foreignField:"clxName", as:"clxParameterFiles"}} make sure that the field “clxName” in “ClxParameterList” collection has an index on it.

Asya

Hello @Asya_Kamsky ,
thank you for your answer. This is very clear, thank you.
I just miss the last bit: in my understanding your suggested aggregation will give me the “light blue” and “green” boxes (in the expcted result picture), but I still miss the red one. How can I modify the aggregation in order to add the red one as well, please?

Regards,
Andrea

Can you give some sample document? Is this the field that’s embedded in an array itself inside another array?

Hi @Asya_Kamsky ,
you can find some examples in the documents above; anyway I copy them here for convenience:

  • ClxParameterList
    { clxName : “file1.clx” , clxBase : “fileB1.clx” , values: [ { param1Val = 5 } { param2Val = 15 } ] }
    { clxName : “file2.clx” , clxBase : null , values: [ { param5Val = 34 } { param6Val = 33 } ] }
    { clxName : “file3.clx” , clxBase : null , values: [ { param5Val = 36 } { param6Val = 100 } ] }
    { clxName : “file4.clx” , clxBase : “fileB2.clx” , values: [ { param1Val = 66 } { param2Val = 75 } ] }
    { clxName : “file5.clx” , clxBase : null , values: [ { param5Val = 3 } { param6Val = 67 } ] }
    { clxName : “fileB1.clx” , clxBase : null , values: [ { param3Val = 56 } { param4Val = 99 } ] }
    { clxName : “fileB2.clx” , clxBase : null , values: [ { param3Val = 77 } { param4Val = 88} ] }

As you can see there is a field called “baseClx”; this field can be one string or NULL. If it is a string it matches the “clxName” of another document in the same collection.

What I need is taking the output of the aggregation that you suggested, go inside the “clxParameterFiles” field that the agregation has created and for each element, if “baseClx” is not null, fill its “clxBaseParameters” field with the matching document of the collection.

Basicaly it is like to write:

{$lookup:{from:“ClxParameterList”, localField:“clxParameterFiles.baseClx”, foreignField:“clxName”, as:“clxParameterFiles.clxBaseParameters”}}

but “clxParameterFiles” is an array, so I cannot do this. I can $unwind, but then I need to undone the unwind because I need everything in just 1 document.

I hope to have been clear; it is tricky.

Please use backticks `` or indentation to make your code/documents format as code - makes it easier to read and cut/paste.

Why don’t you do just that? It should work since "clxParameterFiles.baseClx" again, the right thing will happen - if there’s a value there, matching values from from collection will be fetched.

The only catch is that you cannot use "as" field inside of existing array without blowing the rest of it away - this limitation is described in this enhancement request, I do describe one possible workaround in a comment there.

Are you using version 5.0 or later? If so, you can add additional stages to the pipeline when using localField/foreignField and then you can maybe solve this by doing something like this:

db.RunInfo.aggregate([
    {$match:{year:2020}},
    {$lookup:{
        from:"ClxParameterList", 
        localField:"clxNames", 
        foreignField:"clxName", 
        as:"clxParameterFiles", 
        pipeline:[ {$lookup:{from:"ClxParameterList", localField:"clxBase", foreignField:"clxName", as:"clxBaseParameters"}}]
    }}
]).pretty()
{
"_id" : ObjectId("613b66d0be41cfec6a195265"),
"year" : 2020,
"runID" : "QT_OUT01",
"clxNames" : [
	"file1.clx",
	"file2.clx"
],
"clxParameterFiles" : [
	{
		"_id" : ObjectId("613b6772be41cfec6a195267"),
		"clxName" : "file1.clx",
		"clxBase" : "fileB1.clx",
		"values" : [
			{
				"param1Val" : 5
			},
			{
				"param2Val" : 15
			}
		],
		"clxBaseParameters" : [
			{
				"_id" : ObjectId("613b6990be41cfec6a195269"),
				"clxName" : "fileB1.clx",
				"clxBase" : null,
				"values" : [
					{
						"param1Val" : 34
					},
					{
						"param2Val" : 33
					}
				]
			}
		]
	},
	{
		"_id" : ObjectId("613b6790be41cfec6a195268"),
		"clxName" : "file2.clx",
		"clxBase" : null,
		"values" : [
			{
				"param1Val" : 34
			},
			{
				"param2Val" : 33
			}
		],
		"clxBaseParameters" : [ ]
	}
]
}

(If you just upgraded to 5.0 you’ll need to make sure your FCV is set to “5.0” to be able to use the new $lookup syntax option).

Note that if you wanted to traverse the entire “chain” of such dependencies (i.e. arbitrary number of levels deep) you could just use $graphLookup.

Asya

Hi @Asya_Kamsky ,
everything clear, thank you very much for your incredible support.

Regards,
Andrea

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.