Assessing the optimality of using Aggregation Framework to populate references

Motivation

  • reference population is a commonplace in web development
  • traditional MongoDB ODMs use a “query each one of the references then build the final object using JS” approach
  • it is possible to deep populate references sending a single aggregation pipeline payload to the MongoDB server

Methods

  • Github CI runner was used for each test
  • Used MongoDB version was 8.0.3 (latest)
  • Documents containing no references, a single reference, and a nesated reference were first inserted, then the time to retrieve the documents N times was recorded
  • no caching mechanisms except those of MongoDB were used

Results

  • Prisma was the slowest
  • Mongoose with mongoose-autopopulate was slightly slower to retreive plain and nested references than Mongoose with .populate() method
  • Aeria (which leverages the Aggregation Framework) was the fastest (213% speedup in comparison with Mongoose, 272% speedup in comparison with Prisma)

Conclusion

The Aggregation Framework is the optimal way to deep populate references.

{
  "mongodbVersion": "8.0.3",
  "results": {
    "aeria (bypassSecurity)": {
      "norefs": 1722,
      "plain": 2226,
      "deep": 2683
    },
    "aeria (default)": {
      "norefs": 1996,
      "plain": 2327,
      "deep": 2699
    },
    "mongoose (autopopulate)": {
      "norefs": 1672,
      "plain": 3871,
      "deep": 5702
    },
    "mongoose (populate method)": {
      "norefs": 1924,
      "plain": 3720,
      "deep": 5612
    },
    "prisma": {
      "norefs": 3106,
      "plain": 5208,
      "deep": 7296
    }
  }
}

Link to the repository (contains example aggregation pipeline): GitHub - aeria-org/benchmark

Opinions will be highly appreciated.

PS: I’m yet to see if there’s any slowdowns when the references are nested too deeply, or the Aggregation Framework is linearly faster than multiple queries and JS hooks.