I need to fetch data from multiple collections, and each collection contains millions of documents. I’m evaluating two different approaches to retrieve this data:
Context:
- Each collection has millions of documents.
- Proper indexes are in place for the queried fields.
*$unionWithprovides a unified result, but I’m concerned about its efficiency under load. - The use case involves APIs that may be hit tens of thousands of times per day.
Option A: Multiple find() Queries
const result1 = await db.collection("CollectionA").find(queryA).toArray();
const result2 = await db.collection("CollectionB").find(queryB).toArray();
const result3 = await db.collection("CollectionC").find(queryC).toArray();
These queries are executed either sequentially or in parallel using Promise.all.
Option B: Single Aggregation with $unionWith
db.CollectionA.aggregate([
{ $match: { ...queryA } },
{ $addFields: { source: "A" } },
{
$unionWith: {
coll: "CollectionB",
pipeline: [
{ $match: { ...queryB } },
{ $addFields: { source: "B" } }
]
}
},
{
$unionWith: {
coll: "CollectionC",
pipeline: [
{ $match: { ...queryC } },
{ $addFields: { source: "C" } }
]
}
},
]);
I want to understand:
- Which option is faster?
- Which is more performant overall (in terms of resource usage like memory and CPU)?
- Which is more suitable for high-throughput APIs or large-scale production systems?
Any recommendations or best practices would be appreciated.