Performant way to count unique subdocuments for each document

Hi, I’m looking for the most performant way to count unique subdocuments for each document returned.

My abbreviated data models is in the shape:

const myCollection = [
  {
    _id: "123",
    animals: [{ name: "cat" }, { name: "cat" }, { name: "cat" }],
    fruits: [{ name: "apple" }, { name: "orange" }, { name: "orange" }],
  },
  {
    _id: "456",
    animals: [{ name: "cat" }, { name: "dog" }, { name: "bird" }],
    fruits: [{ name: "apple" }, { name: "pear" }, { name: "mango" }],
  },
];

My desired outcome is:

const desiredResult = [
  {
    _id: "123",
    animals: [{ name: "cat" }, { name: "cat" }, { name: "cat" }],
    fruits: [{ name: "apple" }, { name: "orange" }, { name: "orange" }],
    unique_animals_count: 1,
    unique_fruits_count: 2,
  },
  {
    _id: "456",
    animals: [{ name: "cat" }, { name: "dog" }, { name: "bird" }],
    fruits: [{ name: "apple" }, { name: "pear" }, { name: "mango" }],
    unique_animals_count: 3,
    unique_fruits_count: 3,
  },
];

I have tried a vast array of $unwind, $group and the like, but hours later and a ton of Googling I’ve still not found the answer. And, part of me feels it’s not overly tricky to do … I’m just not landing on it!

Any help would be greatly appreciated.

$group performance is highly affected if you have index on the field or not, I suggest apply a multi key index on “$animals.name” and “$fruits.name”, you should be able to see a huge improvement in the query performance.

Hello,

Please see an attempt to produce the desired result, still not sure this is the performant way as I am new to this.

Explanation:
Step 1 : Perform unwind on both arrays, the first two $unwind steps do the same.
Step 2: Group the unwound result based on record Id.
Step 3: Accumulate the result into two separate sets or unique arrays, $addToSet eliminates the duplicates.
Step 4: Project the size of newly created sets or unique arrays , the $size operator does it.

test> db.test.aggregate(
[
{ $unwind: “$animals” },
{ $unwind: “$fruits” },
{ $group: { _id: “$_id”,
uniqueanimals: { $addToSet: “$animals” },
uniquefruits: { $addToSet: “$fruits” } } },
{ $project: { originalanimals: 1, uniqueanimals: 1,
uniqueanimalscount: { $size: “$uniqueanimals” },
uniquefruits: 1, uniquefruitscount:
{ $size: “$uniquefruits” }
}
}
])

OUTPUT:
[
{
_id: ‘456’,
uniqueanimals: [ { name: ‘cat’ }, { name: ‘bird’ }, { name: ‘dog’ } ],
uniquefruits: [ { name: ‘apple’ }, { name: ‘mango’ }, { name: ‘pear’ } ],
uniqueanimalscount: 3,
uniquefruitscount: 3
},
{
_id: ‘123’,
uniqueanimals: [ { name: ‘cat’ } ],
uniquefruits: [ { name: ‘apple’ }, { name: ‘orange’ } ],
uniqueanimalscount: 1,
uniquefruitscount: 2
}
]

Thanks
WeDoTheBest4Your

1 Like

Could you please read Formatting code and log snippets in posts and then update your post?

It would make it easier to read, understand and we could cut-n-paste it to experiment with. An alternative is make it a playground at https://mongoplayground.net/

Ad Thanks Vance

Based on the feedback from @steevej I’ve created a Mongo Playground, see:

MongoPlayground

… I’ve used the example solution described by @wedothebest_We_do_the_Best (Thank you!)

The solution returns the desired result set.

Additionally, I’ll look to apply the indexing described by Askhat_Gupta3

1 Like

Thank you @steevej.
I went through it and found it is very useful.
Shall follow it.

1 Like

Thank you @Hayden_Pitout for formatting the code and made it quite readable.