Is there a way to query on array of objects such that after the query it will hold only those objects that are fulfilled by the query and also to have distinct on that field

How data structure looks like :
{
“created_at” : ISODate ( “2022-02-12T13:09:01.860+0000” ),
“size_chart” : [
{
“Inventory” : NumberInt ( -1),
“Name” : “L” ,
“Waist” : NumberInt ( 32 ),
“Hips”: NumberInt (42 )
},
{
“Inventory” : NumberInt ( 1),
“Name” : “XL” ,
“Waist” : NumberInt ( 32 ),
“Hips”: NumberInt (42 )
},
{
“Inventory” : NumberInt ( 3),
“Name” : “M” ,
“Waist” : NumberInt ( 32),
“Hips”: NumberInt (42 )
},
],
“status” : “active” ,
“original_price”: 300.0 ,
}

So in this case i want to get only those distinct size names which have inventory greater than 0.
**There are so many documents same as this.

Hi there @Zoro-OP,

Please, surround the code between backticks like so:

```
{
  key1: “hello”,
  key2:“world”
}
```

Ideally you’d also paste the code in the browser terminal and prettify it :wink:

And please include the expected output, also formatted.

Expected Output:

{
"XL"
"M"
}

Prettified Data -

{
    “created_at” : "Date",
    “size_chart” : [
        {
        “Inventory” : NumberInt ( -1),
        “Name” : “L” ,
        “Waist” : NumberInt ( 32 ),
        “Hips”: NumberInt (42 )
        },
        {
        “Inventory” : NumberInt ( 1),
        “Name” : “XL” ,
        “Waist” : NumberInt ( 32 ),
        “Hips”: NumberInt (42 )
        },
        {
        “Inventory” : NumberInt ( 3),
        “Name” : “M” ,
        “Waist” : NumberInt ( 32),
        “Hips”: NumberInt (42 )
        },
    ],
    “status” : “active” ,
    “original_price”: 300.0
}
1 Like

@Zoro-OP hi

Thanks a lot for that formatting and description. If I undestand correctly X,XL, etc will be repeated (not in the code sample that you include). I will assume that.

  1. We can filter a documents collection using a condition to an array.

To get the documents in a collection where an array, including array of objects, hold a key equal to a value, you can use:

db.collection.aggregate({
  "$match": {
    "size_chart.Inventory": {
      "$gt": 0
    }
  }
})

And the array query automatically traverses the array, and looks inside each document, and applies an expression to a field. That’s MongoDB magic…

  1. To turn on and off fields in docs there are $project and $addFields. We want to project only name afaik, you may want the _id too. So we’d add to the pipeline:

{$project:{size_chart.Name:1, _id:1}}

But we need a condition in name, such that there are no repeated values.

We can use $addToSet.

$addToSet returns an array of all unique values that results from applying an expression to each document in a group.

The problem is this operator only work in a $group or related stages, so we will group instead of $project. And afaik $addToSet can’t look inside a document, so we need to unwind the array. This is what I tried, then:

db.collection.aggregate([
  {
    "$match": {
      "size_chart.Inventory": {
        "$gt": 0
      }
    }
  },
  {
    "$project": {
      "Names": "$size_chart.Name"
    }
  },
  {
    $unwind: "$Names"
  },
  {
    "$group": {
      "_id": "$_id",
      "NamesSet": {
        "$addToSet": "$Names"
      }
    }
  }
])

Play with it here, to find any bug and probably will need performance improvements. If the array key that we $match is indexed (i.e size_chart.Inventory), this will be quite fast.


I first wrote this reply, but noticed it was wrong, you may want to read it anyways.
If I understand correctly, we need to first filter documents on a condition, this could be just a $match.

When an operation needs a comparison between documents you normally need some bold approach.

I am not an expert, but I would would $match, $group, $unwind.

1 Like

Excelente aporte. Very good information.