Is this a good practice? $facet

Jgam_N_A · September 16, 2022, 1:07am

I have a question about an interesting use that i found for the $facet pipe and since i haven’t found something similar on any place i wanted to ask if this may have any problems or if there is a better way to achieve the same goal.

To give some context to the example lets imagine we have two collections with a many-to-many relationship at a mongodb database:

1 a collection of persons with name, email and an array of courses to keep it simple.
2 a collection of courses with tags and an array of persons

What i need to do is to create this many-to-many relationship between many documents at the same time without having the _id, i only have two arrays of objects (working with javascript) each element of these arrays is a filter (email filter for persons and tags for courses) that i must use to match the document (or documents) that will need to stablish a link with the ones matching the other array.

of course this example is simplified and in the real problem some fields in the filters can change in the future and cant be used as ids to stablish a relationship.

The “easy” and “step-by-step” way to achieve this is:

db.getCollection("persons").find({ personFilter0 });
db.getCollection("courses").find({ courseFilter0 });

and then use the results to create the relationships with a bulkWrite.

The “problem” is that i need to do this for many documents at the same time, the arrays may have a size of 100k+ so i dont see it so “useful” to make so many trips to the database just to find which documents should be matched.

So instead i asked myself if there is a way to do something like a “bulkFind” with many find operations returning each set of matched documents or an empty array if none match. (if something like that can be done with something different than $facet please show me an example).

Since i’m currently addicted to use aggregate (not a joke) i think that every problem can be solved with it so i came with an idea of using $facet, where each stage will have the index of the array as key and make use of $match as shown below:

theAggregatePipeline = [
  {
    $facet: {
      0: [
        {  $match:  personFilters0 },
        {  $group: { _id: null, { persons: { $push: "$_id" } } } },
        {
          $lookup: {
            from: "courses",
            pipeline: [
              { $match: courseFilters0 },
              { $project: { _id: 1 } }
            ],
            as: "courses"
          }
        }
      ],
      1: index 1 pipeline...,
      n: index n pipeline...
    }
  }
]

This will give me the _id (or more fields if i need for the relationship) and keep them in the same dictionary key so it will be easy to build the bulkWrite operations later (this can also be done in each collection without the $lookup since it is expensive and to keep the work of each search to a single repository).

So… is this a good way to use $facet? since the aggregate returns a single document and this is mainly used to create “categories” or “individual results” (maybe of $group operations with counts) I don’t know if it will be easier to hit a cap in memory with this approach.

And, is there anything else that would give the same result?

I’m also open to answers like “Man do it the simple way since it will be easier to implement X and Y in your backend in the future.” and “don’t worry to much about it backend developers shouldn’t care about bandwidth usage”.

Thanks in advance for reading and if you find this useful for something you have to do you’re welcome.