Getting the value of the most occurrences

Rishi_uttam · May 18, 2022, 1:06pm

Been trying to find the most occurences of a string.

I wanted to do this in a $addFields

Here is the playground, showing demo data, and my query so far.

This is my playground so far, but i feel its very messy.
I am using $map and $filter. I was told this was better then always using $unwind.

My expected output is

`{`
`  name : 'steve',`
`  TopCountry : HKG`
`  }`

But presently it is showing me the total count.

{
name : 'steve',
TopCountry : 1 }`

I would also like to do a sort in the add fields, so it sorts it by top most occurences of country.
Maybe use of a $reduce would be better so i could reduce to the number of times a country was listed.
HKG : 4
USA : 3
CHN : 5

thanks.

steevej · May 19, 2022, 1:22pm

Thanks for providing sample input documents, expected results and what you have tried.

Here is what I come up:

unwind =  { "$unwind" : "$downloads" }
group = { "$group" : {
  _id : { "name" : "$name" , "country" : "$downloads.country" } ,
  count : { "$sum" : 1 }
} }
sort = { "$sort" : { "count" : -1 } }
group_by_name = { "$group" : {
  "_id" : "$_id.name" ,
  "country" : { "$first" : "$_id.country" } ,
  "count" : { "$first" : "$count" }
} }
pipeline = [ unwind , group , sort , group_by_name ]
c.aggregate( pipeline )

What you get is:

{ _id: 'elon', country: 'JPN', count: 2 }
{ _id: 'mark', country: 'USA', count: 5 }
{ _id: 'steve', country: 'HKG', count: 4 }

Not quite in the format you want, but a simple $project should get you to your expected results. I usually prefer not to do this last $project and perform the final formatting at the application layer.

Rishi_uttam · May 19, 2022, 2:53pm

Thanks for this, i will have a look .

I too though i needed to $unwind the downloads array, however the above is a trival example, and my real data has lots of other steps, including 2 other unwinds for nested array. Doesn’t this create a huge amount of workload and documents for mongodb atlas to process before sending data back? Granted i am not looking for best performance, but i did read this, pls scroll to the last comment made by a Mongo Employee (i think) [Asya_Kamsky]

Asha says that i should NOT, and she emphasized in capitals NOT. " You should never use $unwind and then $group when you just need to transform a single document! "

so thats why i went down the rabbit hole of .map and filter but still could not get the data i needed.

pls comment when you have a moment.

are so many unwinds bad
how far down can i do a projection expression i…e array.object.nestedArray.object… something like this wont work right? i would need to unwind it first?
Since my downloads array isn’t nested, and i only want to reach in to downloads.country, why do i need a unwind here?

thanks.

steevej · May 19, 2022, 3:05pm

Your requirement was not to transform a single document. The unwind is bad when you unwind then group using the _id to recreate the original document with the modification. This is not what you are doing.

Please publish real documents so that we do not lose time working on trivial examples that you cannot adapt to your use case.

Because if you don’t you don’t get the correct result. With the code I share, it is very easy to remove the $unwind. Simply do:

pipeline = [ group , sort , group_by_name ]

Rishi_uttam · May 20, 2022, 11:59am

Thank you, this makes sense, so basically any array which i need to do a expression on which has nested objects , i would always need to unwind first before applying any expression.

What happens if downloads.country does not exist, in this case it will still register with a 1 count, instead of 0. count : { "$sum" : 1 }

steevej · May 20, 2022, 12:07pm

NO. The post you shared above is an example where you do not have to unwind. The operations $filter, $reduce and $map are quite powerful.

This is something you can easily try. You can always $match out edge cases in an earlier stage.

Rishi_uttam · May 21, 2022, 4:27pm

Thanks… could you help me do this without unwind and with a filter, reduce / map? i did try that initially in the playground, but could never get it working.

tks.

steevej · May 21, 2022, 7:45pm

Share what you tried and explain us how it failed so that we do not lose time investigating a direction you already know does not work.

steevej · May 25, 2022, 2:18am

I finally was able to look at your playground.

The good news is that the $addFields correctly sums up the occurrences.

Then to get the document that has the max count you will have to do something with $filter, $reduce or $arrayElemAt like explained at mongodb - Find max element inside an array - Stack Overflow

system · October 16, 2024, 3:14am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.