How to group an array using reduce, without unwind+group?

Hello

Is it possible to group an array using reduce(without unwind+group),in the general case?

For simple example (instead of numbers could be documents etc)
[1 2 3 1 2 10 2] to become {“1” [1 1] , “2” [2 2 2] ,“3” [“3”] , “10” [10]}

Also is it possible to reference a field,by constructing the reference inside the pipeline,
like concatStrings("$" “myfield”)?
If i could do the second,i could check if the member is key to the reduced map,and update it
but i cant construct the reference based on what i see in the reduce.

I can use $function stage and javascript but is there a way to do it with mongoQL?

Thank you

Hi @Takis,

I want to help but don’t understand the use case from your example.

Can you provide a source document and the way the end desired documents should look like?

Best pavel

Hello @Pavel_Duchovny thank you for your response.
I want to do what this simple javascript code does, but with mongo.
Reduce over an array and group,without using uwnind+group stages.
The problem is that i cant do this
let key = obj[property];
acc[key];
In mongoDB when you reference a field,you must provide its name
before the pipeline runs
(you cant say something like $$value =“myfield”; toReference("$"+$$value))
In javascript i found the needed key while reducing.

An example would be something like this javascript code,its simple code,just reduce
over an array and group.

let people = [
  { name: 'Alice', age: 21 },
  { name: 'Max', age: 20 },
  { name: 'Jane', age: 20 }
];

function groupBy(objectArray, property) {
  return objectArray.reduce(function (acc, obj) {
    let key = obj[property]
    if (!acc[key]) {
      acc[key] = []
    }
    acc[key].push(obj)
    return acc
  }, {})
}

let groupedPeople = groupBy(people, 'age')
// groupedPeople is:
// { 
//   20: [
//     { name: 'Max', age: 20 }, 
//     { name: 'Jane', age: 20 }
//   ], 
//   21: [{ name: 'Alice', age: 21 }] 
// }

I hope there is a simple and fast way to do it like the javascript code does.
I think not being able to construct references inside the pipeline is causing many problems
not being able to group by is one of them.

Thank you

Hi @Takis,

Ok I did not find a good way to do this without using a simple $objectToArray command as I need values to become keys and this is the only way I know how to do this.

Here is my example:

db.people.insertMany([{ name: 'Alice', age: 21 }, { name: 'Max', age: 20 }, { name: 'Jane', age: 20 }])
{ acknowleged: 1,
  insertedIds: 
   { '0': '5f508cb9773ad5bb561ae1e2',
     '1': '5f508cb9773ad5bb561ae1e3',
     '2': '5f508cb9773ad5bb561ae1e4' } }

Now my pipline is:

// Optional do not project _id
[{$project: {
  _id : 0
}},
// Grouping all documents by age and pushing them into "ages" array
 {$group: {
  _id: "$age",
  "ages": {
     $push : "$$ROOT"
  }
}},
// Building a [ k: <age> v: <docs array of associated ages>]
 {$group: {
    "_id": null,
    "data": {
      "$push": { "k": {$toString: "$_id"}, "v": "$ages" }
    }
  }},
 // Replacing the new root to be the age as a field and people under that age as array of docs
 {$replaceRoot: {
  newRoot: { "$arrayToObject": "$data" }
}}]

The aggregation:

db.people.aggregate(// Optional do not project _id
[{$project: {
  _id : 0
}},
// Grouping all documents by age and pushing them into "ages" array
 {$group: {
  _id: "$age",
  "ages": {
     $push : "$$ROOT"
  }
}},
// Building a [ k: <age> v: <docs array of associated ages>]
 {$group: {
    "_id": null,
    "data": {
      "$push": { "k": {$toString: "$_id"}, "v": "$ages" }
    }
  }},
 // Replacing the new root to be the age as a field and people under that age as array of docs
 {$replaceRoot: {
  newRoot: { "$arrayToObject": "$data" }
}}])
[ { '20': [ { name: 'Max', age: 20 }, { name: 'Jane', age: 20 } ],
    '21': [ { name: 'Alice', age: 21 } ] } ]

Let me know if this is what you are looking for.

Best regards,
Pavel

1 Like

Hello

Thank you for trying to help me,but this isn’t what i want to do.
I just think that mongoDB doesnt support it yet (refer to a field,when its value its computed during the pipeline)
I am Clojure programmer and in clojure to reduce an array to a map is so easy.

(def doc {"myarray" [
                    { "name" "Alice"  "age"  21 }
                    { "name" "Max"  "age" 20 }
                    { "name" "Jane" "age" 20 }
                    ]})

(prn (reduce (fn [grouped-array doc]
                   (let [cur-key (str (get doc "age"))] ;;the key,here is the age value,i make it string,to be like valid json
                     (if (contains? grouped-array cur-key) ;;if the reduced map so far contains the key
                       (update grouped-array cur-key conj doc) ;;update to add the doc to the already array {key [.. doc] ...}
                       (assoc grouped-array cur-key [doc]))))  ;;else  {key [doc]}  ,its the first doc found with that key
                 {}
                 (get doc "myarray")))

;;prints {"21" [{"name" "Alice", "age" 21}], "20" [{"name" "Max", "age" 20} {"name" "Jane", "age" 20}]}

MongoDB doesn’t allow clojure code so one solution is to use the $function operator and javascript code
i did it using the next command,after i inserted the above 1 document to the my collection {“myarray” …}

"pipeline": [
    {
      "$project": {
        "_id": 0,
        "mygroupedarray": {
          "$function": {
            "args": [
              "$myarray"
            ],
            "lang": "js",
            "body": "function groupBy(objectArray) {\n  return objectArray.reduce(function (acc, obj) {\n    let key = obj[\"age\"]+\"\";\n    if (!acc[key]) {\n      acc[key] = [];\n    }\n    acc[key].push(obj);\n    return acc;\n  }, {})\n}"
          }
        }
      }
    }
  ]

Worked fine i took the same result

"mygroupedarray": {
          "20": [
            {
              "name": "Max",
              "age": 20
            },
            {
              "name": "Jane",
              "age": 20
            }
          ],
          "21": [
            {
              "name": "Alice",
              "age": 21
            }
          ]
        }

But i want to do this using mongo query language reduce,without javascript.
I want array -> reduce -> document (the represent the group) (not use unwind,not use group)

Its not that i have a application that really needs it,but i dont know a way to make mongoDB,
refer to a field,that its name is found doing the pipeline.For example in javascript code
if(!acc[key]) meaned that if the calculated key,is not contained in the object(document) acc.
In mongoDB you cant do this,refer to a field that you don’t know its value from before is not
possible(objectToArray is a solution to this,treating the key as value,but can it help here?
and can work in simple and fast way?)

This problem can happen anytime i want to refer to a field,that its name is calculated during
the pipepline time

For example for document  {"myarray" [1,2,3]} ,this doesn't work.
[{"$project":{"myarray1":{"$concat":[{"$literal":"$"},"myarray"]}}}]
this works,because i already know the name of the field reference before pipeline starts.
[{"$project":{"myarray1":"$myarray"}}]

If the above worked i could also group using reduce only i think.

Thank you.

Hi @Takis,

The query I gave you produce the following output:


"mygroupedarray": {
          "20": [
            {
              "name": "Max",
              "age": 20
            },
            {
              "name": "Jane",
              "age": 20
            }
          ],
          "21": [
            {
              "name": "Alice",
              "age": 21
            }
          ]
        }

The other outputs you show are not a valid JSON format therefore MongoDB will never output them like that and your application have to do the adjustments.

Playing with arrayToObject and objectToArray is the only way I know to use a value and transform it to a field name…

Best
Pavel

Thank you for trying to help me,i think its not possible to group an array only by using reduce in mongoql.
I will make a new topic,asking about the root of the problem.

Thank you alot

@Takis

It’s definitely possible to do what you are asking about. Starting with a single document with your array of people, here’s what it looks like:

    db.people.find()
    { "_id" : 0, "myarray" : [ { "name" : "Alice", "age" : 21 }, { "name" : "Max", "age" : 20 }, { "name" : "Jane", "age" : 20 } ] } 

    db.people.aggregate({$project:{ ages: {$map:{
        input:{$setUnion:"$myarray.age"}, 
        as: "a", 
        in: {
             age: "$$a", 
             people: { $filter:{ input:"$myarray", cond:{$eq:["$$a", "$$this.age"]}}}
         }
    }}}})
    { "_id" : 0, "ages" : [ 
                { "age" : 20, "people" : [ { "name" : "Max", "age" : 20 }, { "name" : "Jane", "age" : 20 } ] }, 
                { "age" : 21, "people" : [ { "name" : "Alice", "age" : 21 } ] } 
   ] }
1 Like

Hello

Thank you for the reply,i know its possible with serial search but its slower than javascript.
Using $function operator i wrote a reduce in javascript (with O(1) contains(doc,key)) i got the below results.

function (objectArray,property) {
  return objectArray.reduce(function (acc, obj) {
    var key = obj[property].valueOf().toString();
    if (!acc[key]) {
      acc[key] = [];
    }
    acc[key].push(obj);
    return acc;
  }, {})
}
10 members     => js 5x slower
100 members    => js 3x slower
1000 members   => same
10000 members  => js 2x faster
100000 members => js 3x faster

To do those fast we need document operators,with variables as arguments.

put(doc,$$k,$$v)     O(1)
contains?(doc,$$k)   O(1)   //in our case was O(n) because no other way
get(doc,$$k)         O(1)
remove(doc,$$k)      O(1)
keys(doc)
values(doc)
.....

I made put with variables using, $arrayToObject and $mergeObjects but its slow.
Seems that $mergeObjects even if you add 1 key/value its O(n)

k,v are variables
{$mergeObjects doc {$arrayToObject [[k v]}}

Its ok i will wait for document operators accepting variables to be added to MQL.
Also i think we dont have $push(array,$$v) in aggregation(only in group we have),
and $concatArrays(array,[$$v]) is slow not O(1).