Performance Question: `$lookup` vs. Multiple `find` Queries in MongoDB

`I’m currently working on optimizing some MongoDB queries and would love to get some insights from the community.

Scenario:

I need to perform complex queries involving multiple collections, and I’m considering two different approaches:

  1. Using $lookup to join multiple collections within a single query. This operation would be executed thousands to tens of thousands of times.
  2. Performing multiple find queries on various collections and handling the results using a not in clause, also thousands to tens of thousands of times.

Question

Which approach is generally more efficient in terms of performance and design?

Considerations

  • $lookup:

    • Pros: Joins data in a single query, reducing network round-trips.
    • Cons: Can result in high memory usage and slower performance if the datasets involved are large.
  • Multiple find queries:

    • Pros: Each query is relatively lightweight, and parallel or asynchronous processing can be utilized to optimize performance.
    • Cons: Increases the number of network round-trips, which could affect overall performance.

Key Factors

  1. Indexing: Ensuring proper indexing is crucial for improving performance in both approaches.
  2. Parallel Processing: Leveraging parallel and asynchronous processing can significantly boost performance when using multiple find queries.
  3. Data Modeling: Optimizing the data model to minimize the need for joins might provide substantial performance benefits.
  4. Aggregation Framework Optimization: For $lookup, fine-tuning the aggregation pipeline can help reduce memory usage and enhance performance.

Context:

  • If the datasets being joined are manageable in size, $lookup might be more efficient.
  • For larger datasets, multiple find queries with parallel processing could potentially offer better performance.

Additional Information:

  • MongoDB Atlas version: 6
  • Spring Data MongoDB version: 4.2.5

I’m looking for advice based on your experiences and any best practices you can share. Which approach has proven to be more effective in similar scenarios?

Hi @how,

Welcome to the MongoDB Community forums :sparkles:

There isn’t a single right answer to this question because it depends on the specifics of your use case. Generally, we recommend avoiding lookups whenever possible and storing data in a single collection if it is often accessed together.

However, the best approach depends on your schema design, the structure of your documents, and the data you frequently access. Considering all these factors, there could be a design approach that allows you to use MongoDB to its fullest potential.

If you need further assistance, please share sample documents from your collection(s), the query you plan to execute multiple times, the expected output, the use case, and the size of the collection. This information will give the community more insight into your specific use case in order to provide you the suggestion.

Here are some resource you can refer to for more details:

Also, refer to MongoDB Dev Center to read articles around Schema design to learn more.

Best regards,
Kushagra

1 Like