I’m trying to get the files from the collection sensorsDataHistoric with idSensor: 3 whose idDevice is related to group.$id: 11 in collection devices. An example of both kind of files:
Usually if a query does not end it means that it is probably ineffecient…
With expressive lookups like this using indexes can be challenging it was mainly improved in 5.0+.
Now what you should verify is weather you have indexes on { idSensor : 1, idDevice} on sensors collection. The device collection should have { _id : 1, “group.$id” : 1} indexed.
But again if you use server 5.0+.
Also try to limit the results to a small amount before running the unwind to check how fast do first results retrieved
My initial query works really fast, but I need to find an alternative syntax for "idDevice": {"$in": db.devices.distinct("_id", {"group.$id": 11})} that it is JSON-compatible.
Yes, I have tried something similar before but I need to filter both idSensor and idDevice before the $lookup stage cause otherwise it spends a lot of time, some minutes.
Rather than aggregating on sensorsDataHistoric and $lookup:{from:devices,…}, aggregate on devices and $lookup:{from:sensorsDataHistoric,…}. Something along the following.
You have been told in your other thread that distinct is a command not an aggregation operator.
We understand that, but your model does not support that query easily because your use case start with a group rather than a device. That is why, you should consider adding the group to the historical data. This way you can do what you want to do without even looking at the device collection.
When queries get too complex or too slow it is often an indication that there is some thing about your model that does not support your use case.
And distinct is not really needed as _id are unique anyway. So a match on group:11 will return a list without duplicate value.
If the query is slower in one direction compare to the other it might be related to missing indexes. I second
but for the added reason that $lookup has some improvement that might makes the inverted lookup faster.
New in version 5.0 .
Starting in MongoDB 5.0, you can use a concise syntax for a correlated subquery. Correlated subqueries reference document fields from a joined “foreign” collection and the “local” collection on which the aggregate() method was run.
The following new concise syntax removes the requirement for an equality match on the foreign and local fields inside of an $expr operator:
You are currently storing idDevice in the sensorsDataHistoric collection. But in additional, you can also add one new field which will be groupDevice. It will duplicate the date from devices collection, but that will be big optimization since you can query everything that you need without the $lookup to the devices collection.
Obviously, you will need to implement the logic that would update groupDevice property in all documents from sensorsDataHistoric collection, when user updates the group property of the device document.
I got lost spinning around in the previous thread, sorry.
That’s what I wanted to know: I can’t substitute db.collection.distinct() for distinct, right?
Well, I might tell the DBA about updating the server to 5.0 or even including group in sensorsData, but I guess we have a devices collection for some reason.
the suggestions presented by @NeNaD and @steevej does not meant to eliminate the devices collection but to add needed information to sensorsDataHistoric to allow a query to be based just by it, in your case this “group.id” .
We call this the extended reference pattern when you embed additional reference keys to avoid lookups and joins to speedup queries:
You cannot call the distinct command from within the aggregate command. If you add distinct in an aggregation pipeline, it is just a value:
You can perform the equivalent processing of a distinct command in the aggregation framework using a $group aggregation stage, but I would consider the recommendations you’ve been given such as adjusting the data model to support your common queries or upgrading to MongoDB 5.0+ to take advantage of performance improvements.
When you campare distinct values using a distinct query you assume that the values coming from that query might be duplicated, but since you compare to _id, which is a unique index by definition it is equivalent to just query those _id, no need for distinct…