Often questions like that are better answered by yourself using your own dataset since there is usually not a good answer for all cases.
And this is something very easy to test. Writing an $unionWith should take only a few hours.
But in my opinion, I think the union with would be faster because you do not do anything before the $facet.
But in my opinion, you should not trust anyone who does not back his claims with data.
But in my opinion too, $facet might be faster depending of your indexes. Because without proper index, $facet might do a single collection scan and unionWith will do to. With proper index you might end up with 2 index scan which are probably faster than 1 collection scan.
$facet runs multiple sub-pipelines in one pass over the input, so it’s usually more efficient in terms of I/O, latency, and memory than running two separate pipelines and combining with unionWith.
unionWith means executing two full queries and merging results, higher CPU and memory, more round-trips, slower.
A $facet stage will process the documents as they are currently in your pipeline. A $unionWith stage will re-read documents from the collection you specify, even if it’s the same one. So you can think of it as an additional ‘read from disk’ for that collection.
Input documents are passed to the $facet stage only once. $facet enables various aggregations on the same set of input documents, without needing to retrieve the input documents multiple times.
This also means that if have 20 stages (for example) in your aggregation which transform your data before facet/unionWith, then:
With $facet, you can just add the two pipelines for the two fields; as you have done.
But for $unionWith, you’ll need to repeat those 20 stages before you can add the second field.
(If you are adding more fields using unionWith, all those stages will need to be repeated every time for each union. For facet, it’s just once.)
Exactly this. And make sure your dataset/collection is large enough to show a difference. Like 1-10 million documents.
It is right, usually more efficient, but I am not sure it is in this case. Since the pipeline
you must likely incur a collection scan which might fetch all documents. But we know nothing about the dataset and the indexes. My claim is that with proper indexes you may end up with 2 index scans rather than 1 collection scan. In some cases, 2 index scans might be better than 1 collection scan. Specially in this case since the dataset of the 2 facets are mutually exclusives.
That is why testing should be done by original poster with its own dataset.
And testing should also try without the $unionWith and do 2 completely different aggregations (yes 2 round trips to the server) that would run in parallel in the server. You combine the output in the application. All this could be done because nothing is done before and after $facet.
It might be even possible to improve the $facet.
I would try to move $sort of category order and advance_option… before the $facet.
Then I would have 4 $facet rather than 2, one for each CategoryOrder, this would limit the risk of busting the BSON limit. Each $facet would only $sort on start_time which could improve in-memory sort if the indexes cannot be used.
Thanks for pitching in, we all get better when we oppose opinions and ideas.
This was your first post. Keep it going that how we like it.
Yes, but in this case the documents are mutually exclusive and doing 2 index scans might avoid reading the documents from disk, while a collection scan might need to fetch all documents. I do know enough about the dataset and its indexes to provide anything else as recommendation, than test it.
Valid point 2
Completely agree, hence
Thanks again! I really hope the original author will test and report here.