Hi, I have an aggregation pipeline that works on potentially many data and I was looking for how to make it faster.
The format of the documents is this:
{
"array": [
{ "k": "A", "v": 1},
{ "k": "B", "v": 2},
{ "k": "C", "v": 3}
]
},
{
"array": [
{ "k": "A", "v": 3},
{ "k": "B", "v": 2},
{ "k": "C", "v": 1}
]
}
With the execution of the pipeline I would like to obtain the sum of the “v” of each “k”:
{ "k": "A", "v": 4}
{ "k": "B", "v": 4}
{ "k": "C", "v": 4}
I wrote a classic pipeline with $unwind + $group, could I do better?
{"$unwind": "$array"},
{"$group": {
"_id": "$k",
"v": {
"$sum": "$v"
}
}
I could use a $group + $project($reduce) to do the same thing ?
Thanks
Can you tell me more specifically what you are trying to do?
As I find this aggregation pipeline a little bit tough to optimize, one thing I would do is have a separate document with an array field
Thanks for the reply.
I was imagining using $reduce, but I don’t know how to write it.
An example (wrong but to understand the pipeline output) would be a $accumulator:
coll.aggregate([
{
$match: {
...
}
},
{
$group: {
_id: null,
total: {
$accumulator: {
init: function () {
return { array: [] };
},
accumulate: function (state, array) {
var result = {};
state.ts.concat(ts).forEach(obj => {
if (result.hasOwnProperty(obj.k)) {
result[obj.k += obj.v;
} else {
result[obj.k] = obj.v;
}
});
state.ts = Object.entries(result).map(([k, v]) => ({ k, v }));
return state;
},
accumulateArgs: ["$array"],
merge: function (state1, state2) {
var result = {};
state1.ts.concat(state2.ts).forEach(obj => {
if (result.hasOwnProperty(obj.d)) {
result[obj.k] += obj.v;
} else {
result[obj.k] = obj.v;
}
});
state1.ts = Object.entries(result).map(([k, v]) => ({ k, v }));
return state1;
}
}
}
}
}
])