Modeling huge binary tree

Hi,
I need to model a huge binary tree considering:

*Need to know if a node is left child or right child
*The binary tree can grow indefinitely
*Updating structure is not possible, It is only allowed to add a new nodes
*Find specific descendant node by numer and direction (left or right)
*Find last descendant by direction (left or right)

I was reviewing the proposed pattenrs by mongodb:
°Materialized Paths : https://docs.mongodb.com/manual/tutorial/model-tree-structures-with-materialized-paths/
°Child References: https://docs.mongodb.com/manual/tutorial/model-tree-structures-with-child-references/

I was thinking for my case is better use Materialized Paths but I was worried about the storage of the path because of its lenght (grow indefinitely). In the other hand there is Child References pattern, it is good but I was worried about finding the whole path of a node talking about the performance of the traversal operation.

Thats it, which one would you recommendme?, What other ways do you think there are to model this situation?.
Thanks

1 Like

Hey Andres, you are on the right path. This actually gets a little tricky as you have the need for the full path, and your tree can grow indefinite. If this really is a very big (very long path to leaf), you might want to consider a type of mix approach between the two. So you can have like a materialised path of say upto 5-10 levels up from the node in mind (or more, you need to check the performance, don’t go more than 500 i guess). So rather than the full path as a string, your memory consumption of path would be less (per doc), and then you can go the next step if you still are not on the root node. It’s a bit like bucketing / batch paths / pagination.

Example graph below ->


and assuming we are storing a path of 2 nodes up, the node of F might look like ->

{
    "node": "F",
    "path": ["B", "D"],
    "left": "G",
    "right": "H"
}

Now, if you want the whole path, you can find use $graphLookup to iterate all the way back to node A, using “first” element of path (top most if batch is 2). You need to find the optimal batch number for your “long” paths.

Also, I have used an array of parent path and not a string as per mongo docs, as if we use simple regex on paths, it might lead to collection scan, whereas if we use array, we can index the path field (multikey index) and make use of indexed queries and lookup.

More suggestions are welcome…! Do let me know how it works…!

2 Likes