How can I get maximum value of a document with many different fields through aggregation? sub field names are same

1224084650 · July 19, 2022, 1:53pm

I have following document:

Below layouts.nodes, there are many node fields(no limit), and I want to find the maximum x value in these nodes. How can I achieve this? (this specific document format is a configuration of a frontend library)

steevej · July 19, 2022, 10:22pm

The major problem with your issue is that nodes is an object rather than an array with field names that look like array index but are not array index.

The following:

"nodes" : [
 { x : 0 , y : 1 } ,
 { x : 100 , y : 60 }
]

would be more efficient in terms of space and in terms of processing. It would be simple to implement your use-case as you could use "$max":"$nodes.x" in a projection to do it.

With your schema you still have to use "$max", but you have to perform another step in order to transform your object nodes into an array with $objectToArray.

1224084650 · July 20, 2022, 3:38am

yeah, you are right. If nodes are an array, then everything should be fine. Does the performance difference between storing an array and storing them as objects separately large? Should I store it as an array and transform it into objects when the front-end requests it?

steevej · July 20, 2022, 2:08pm

Lets do a little test, with 2 collections of 1 document with 12 nodes:

mongosh> db.nodes.find()
{ _id: ObjectId("62d802a681fed875f28c70ec"),
  node0: { x: 123, y: 456 },
  node1: { x: 123, y: 456 },
  node2: { x: 123, y: 456 },
  node3: { x: 123, y: 456 },
  node4: { x: 123, y: 456 },
  node5: { x: 123, y: 456 },
  node6: { x: 123, y: 456 },
  node7: { x: 123, y: 456 },
  node8: { x: 123, y: 456 },
  node9: { x: 123, y: 456 },
  node10: { x: 123, y: 456 },
  node11: { x: 123, y: 456 },
  node12: { x: 123, y: 456 } }
mongosh> db.nodes.stats().avgObjSize
375 /* bytes */
mongosh> db.nodes_array.find()
{ _id: ObjectId("62d8040181fed875f28c70ee"),
  nodes: 
   [ { x: 123, y: 456 },
     { x: 123, y: 456 },
     { x: 123, y: 456 },
     { x: 123, y: 456 },
     { x: 123, y: 456 },
     { x: 123, y: 456 },
     { x: 123, y: 456 },
     { x: 123, y: 456 },
     { x: 123, y: 456 },
     { x: 123, y: 456 },    
     { x: 123, y: 456 },
     { x: 123, y: 456 } ] }
mongosh> db.nodes_array.stats()
323 /* bytes */

A difference of 52 bytes does not seem a lot, but that is 1 document with only 12 nodes. But it all adds up, you need bigger permanent storage, you need bigger RAM to hold your working set in cache, you need more bandwidth to download. Plus you need extra processing on the server to do $objectToArray in order to implement your use-case. It is easier to distribute this extra processing (which might not be needed) if you do

Because only the front-end that requests it is impacted by the conversion. Everyone is when done on the server.

I suspect that your Blue Train has much more that 12 nodes so 52 bytes is very low compared to your real data.

I also suspect that your other top level objects like *nodes, edges and paths have the same structure. I even suspect that paths might refers to layouts.nodes by names like

paths : {
  path0 : [ "node0" , "node3" , "node4" ]  ,
  path1 : [ "node2" , "node12" , "node11" ]
}

Having the following will have an even bigger positive impact.

paths : [
  [ 0 , 3 , 4 ]  ,
  [ 2 , 12 , 11 ]
]

1224084650 · July 21, 2022, 3:22am

I will ask the library author whether we can change the data structure design of it. Thank you for the elaborated explanation and experiments.

system · July 26, 2022, 3:22am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.