I have a table with 500k rows. Each row may join 1:many with 20 other sources of data. Until now, I had all this information stored in a single table – it makes it very easy to search.
Despite an on-disk size of 6GB, even with 32GB of ram, it can take 220+seconds to do a tablescan.
I realized that I can opportunistically $lookup each data source, so that table scans are faster.
I usually only query on 1-3 pieces of data, not all 20. So for matching, I can do a $lookup, apply my $matches, then do the next $lookup, etc.
Can I do that automatically with a View? It does “automatic pipeline optimization” where it hoists the pre-$lookup queries to before the join, but then it just does all the $lookups before applying the $match, even though it should run that one $lookup and then the $match, which will remove most of the $rows from requiring $lookups.
Is there any way to give hints? To shape my query better? I’m afraid I just have to build the query manually.
Example:
Pipeline as a view:
db.createCollection(
"propertiesView",
{
"viewOn" : "propertiesPointers",
"pipeline" : [
{ $lookup: { from: "DataTaxes", localField: 'data.taxes', foreignField:'_id', as: 'taxes' } }
,{ $lookup: { from: "DataBusinesses", localField: 'data.businesses', foreignField:'_id', as: 'businesses' } }
,{ $lookup: { from: "DataParking", localField: 'data.parking', foreignField:'_id', as: 'parking' } }
,{ $lookup: { from: "DataRestaurants", localField: 'data.restaurants', foreignField:'_id', as: 'restaurants' } }
]
}
)
If I do a match on taxes
it should automatically be placed after the lookup on taxes, not at the end of the entire pipeline, which is what happens now on mongodb version v4.2.1 and latest v4.2.7
Oddly, it works on normal queries, hoisting it to right after the $lookup
db.getCollection('propertiesView').find({"Year":2018, "data.price":4}).explain()
But not on size - it puts it at the end, which ironically, this one it could figure out before doing the lookup (“how large is the array of IDs to lookup”)
db.getCollection('propertiesView').find({"Year":2018, "data.parking":{$size:4}}).explain()
But regardless, it should arrange the actual $lookup order based on which matches I’ll be doing.
Is there a way to tell mongodb that?