Query to filter by filed in array of deeply nested documents

Hello, I have this data structure:

{
  "categories": [
    {
      "fieldToFilterBy": "1",
      "child": {
        "fieldToFilterBy": "2",
        "child": {
          "fieldToFilterBy": "3"
        }
      }
    },
    {
      "fieldToFilterBy": "5"
    }
  ]
}

categories is an array of unknown length. Each entry in categories is tree, where parent has only one or zero child. This “entries” can have different height.

Problem:
I need to filter all documents that have specific value of fieldToFilterBy.

What I tried:

  1. { $or: [{ "categories.fieldToFilterBy": "1"}, {"categories.child.fieldToFilterBy" : "1" }]} , but this hardly an option, because the exact height is unknown.
  2. Create new field in document that contains list of fieldToFilterBy:
{
  "categories": [],
  "fieldsToFilterBy": [
    {
      "fields": ["1", "2", "3"]
    },
    {
      "fields": ["5"]
    }
  ]
}

this one doesn’t fit, because I don’t want to clutter document with fields that have no business value.

Is there a clean way to achieve that kind of filtering?

Hi, @jowitty, you can use a model like this:

{
  categories: [<Node>]
}

Node:
{
  "fieldToFilterBy": <String>
  "child": <Node>
}

You’ll set child’s value to null if it’s a leaf.
Then you’ll be able to solve your task by performing $graphLookup.

Edit: On second thought I realize that your data structure is more an array of arrays than a tree. So maybe the simple solution is like you suggested:

{
  "categories": [],
  "fieldsToFilterBy": [
    {
      "listPath": [{fieldsToFilterBy :"1"}, {fieldsToFilterBy :"2"}, {fieldsToFilterBy :"3"}]
    },
    {
      "listPath": [{fieldsToFilterBy :"5"}]
    }
  ]
}

Both options have cons and pros.
Good luck,
Rafael,

Hi, @Rafael_Green, thanks for your answer!
Can you, please, explain it a little bit?
I set child, if it is leaf, to null like this:
image
and try to perform this query:

{
from: ‘documents’,
startWith: ‘$categories’,
connectFromField: ‘child’,
connectToField: ‘child’,
as: ‘some_name’
}

but this result in empty some_name. I’m not completely understand how $graphLookup can recursively search not whole collection, but inside document

Hi,
Currently you save all the data in one document.
I meant to say that you can create a new model “Node” with child as a reference (ObjectId) to another node.
Then you’ll be able to perform $lookup from yourModel.categories.clild to node._id and then $graphLookup from node.child to node._id (connectFromField
= “child” connectToField = “_id”)
Thanks,
Rafael,

@Rafael_Green, thanks for explanations, I’ll need to check what suites best my use-case.

For those, who looks into search inside deeply nested documents, checkout this ticket (sadly, at the time of writing - opened):
https://jira.mongodb.org/browse/SERVER-267

I’m not sure if fieldToFilterBy represents a specific field or if you have multiple possible fields you want to filter on at each level . The key thing here is to look at how you can index these searches - it’s possible to construct a query to find what you want using the $or method you show but it may be much harder to index depending on if we are talking just a single field at 3 levels or something more complicated. If you have far more levels (I think 16 is still a limit) then it just makes a larger query.

Your approach of pulling out the searchable fields into an array and indexing that is what is referred to a as a payload and processing schema. The schema is not designed to optimise the use of the database but to server some external data shape requirement. By pulling out the searchable data to a simple array the ease of querying and efficiency of indexing it easily outweighs the small extra storage costs.

Any solution using $lookup or $graphLookup is likely to me an order of magnitude or two slower / less efficient.

Hi, @John_Page, really appreciate your answer!

fieldToFilterBy currently is the only field that participates in filtering, but in future there will be more (not much, just 2-4 new). Those fields will be represented on each level and in filtering their values not necessary have to be on the same level.

You mean that creating of several indexes or one compound index on such array with deeply nested documents (lets call it array of paths, cause there is no branching just a path inside element of an array) will poorly result on update speed?

What do you mean by “large query”: just it’s length or performance impact?
The problem with using $or is that I don’t know the max size. As for now db has limit of 100, but standards may change and ideally we all don’t want to stick to some constants. That’s why I really hope that mongodb will have something like recursive search inside nested structure in future)

Yes checking and strictly speaking MongoDB has a limit of 100 levels - I thought it was 16 as that’s a limit in some parts and about 12 more than should ever be used in an actual schema (versus arbitrary data) if you have more then 3 or 4 levels then I would not recommend the $or approach and there would be no way to practically index it . You cannot have individual indexes for each level as you hit the limit on number of indexes (and update performance) and a compound index would really not be much more efficient even if you used something like a $sort to force it’s use. The only practical option is to pull out the fields you need to search in to a more searchable data structure. The usual recommendation is never to have more than a dozen or so indexes in total on a MongoDB collection.

I feel your example may be missing some information but if each level has either 1 or no children why model as a hierarchy, why not as an array ? that will be more indexable and slightly smaller and faster to process. You have an array of arrays rather than an array of objects the ‘Depth’ equates to the position in the array. You can reshape at the client (fast) or with an aggregation based projection if you really want it put back into nested objects by the server.

Thanks! I really like this approach to migrate to array of arrays.
I think my colleagues and I need to take some time to discuss current state of data structure.

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.