Extended $lookup

Hi @Javier_Blanco ,

Usually if a query does not end it means that it is probably ineffecient…

With expressive lookups like this using indexes can be challenging it was mainly improved in 5.0+.

Now what you should verify is weather you have indexes on { idSensor : 1, idDevice} on sensors collection. The device collection should have { _id : 1, “group.$id” : 1} indexed.

But again if you use server 5.0+.

Also try to limit the results to a small amount before running the unwind to check how fast do first results retrieved

Thanks
Pavel

Hi, Pavel, thanks for your answer.

I’m working with 4.4.6. version of MongoDB.

Hi @Javier_Blanco ,

I really recommend considering a 5.0 upgrade .

It also has the timeseries collections which looks like might fit your data design…

2 Likes

To be honest, I got here from here:

My initial query works really fast, but I need to find an alternative syntax for "idDevice": {"$in": db.devices.distinct("_id", {"group.$id": 11})} that it is JSON-compatible.

Hi @Javier_Blanco ,

So why won’t you do it in 2 queries.

  1. Will get all disitinct device ids in an array
  2. will run a $in query on that array?

I need to solve everything in just one query.

Hi @Javier_Blanco ,

You can do it with Aggregation framework:

  • $match - To filter documents by idSensor and idDevice.group.$id fields.
  • $lookup - To populate idDevice field with data from devices collection.
  • $project - To return only fields that you need.
db.sensorsDataHistoric.aggregate([
  {
    "$match": {
      "idSensor": 3
    }
  },
  {
    "$lookup": {
      "from": "devices",
      "localField": "idDevice",
      "foreignField": "_id",
      "as": "idDevice"
    }
  },
  {
    "$match": {
      "idDevice.group.$id": 11
    }
  },
  {
    "$project": {
      "idSensor": 1,
      "idDevice": 1
    }
  }
])

Working example

1 Like

Hi, @NeNaD, thanks for your answer.

Yes, I have tried something similar before but I need to filter both idSensor and idDevice before the $lookup stage cause otherwise it spends a lot of time, some minutes.

Doing this initial $match:

{
    $match: 
    {
        idSensor: 3,
        idDevice: {$in: db.devices.distinct(_id, {group.$id: 11})}
    }
},

I get the files in around 5 seconds.

Could you inverse the lookup flow?

Rather than aggregating on sensorsDataHistoric and $lookup:{from:devices,…}, aggregate on devices and $lookup:{from:sensorsDataHistoric,…}. Something along the following.

[
  { '$match': { group: 11 } },
  {
    '$lookup': {
      from: 'history',
      let: { device: '$_id' },
      pipeline: [
        {
          '$match': {
            '$expr': { '$and': [ { '$eq': [ '$device', '$$device' ] }, { '$eq': [ '$sensor', 3 ] } ] }
          }
        }
      ],
      as: 'history'
    }
  }
]

But if it is a frequent use-case you might consider adding the group field to the data historic.

From my experience, aggregating on devices makes the queries go slower than doing it on sensorsDataHistoric.

Anyway, the key is to filter both idSensor and idDevice before anything else.

I have tried to change:

"idDevice": {"$in": db.devices.distinct("_id", {"group.$id": 11})}

By:

"idDevice": {"$in": ["distinct": "devices", "key": "_id", "query": {"group.$id": 11}]}

According to this:

The db.collection.distinct() method provides a wrapper around the distinct command.

But it doesn’t work.

You have been told in your other thread that distinct is a command not an aggregation operator.

We understand that, but your model does not support that query easily because your use case start with a group rather than a device. That is why, you should consider adding the group to the historical data. This way you can do what you want to do without even looking at the device collection.

When queries get too complex or too slow it is often an indication that there is some thing about your model that does not support your use case.

And distinct is not really needed as _id are unique anyway. So a match on group:11 will return a list without duplicate value.

If the query is slower in one direction compare to the other it might be related to missing indexes. I second

but for the added reason that $lookup has some improvement that might makes the inverted lookup faster.

New in version 5.0 .

Starting in MongoDB 5.0, you can use a concise syntax for a correlated subquery. Correlated subqueries reference document fields from a joined “foreign” collection and the “local” collection on which the aggregate() method was run.

The following new concise syntax removes the requirement for an equality match on the foreign and local fields inside of an $expr operator:

1 Like

Hi @Javier_Blanco,

You can consider the following approach:

You are currently storing idDevice in the sensorsDataHistoric collection. But in additional, you can also add one new field which will be groupDevice. It will duplicate the date from devices collection, but that will be big optimization since you can query everything that you need without the $lookup to the devices collection.

Obviously, you will need to implement the logic that would update groupDevice property in all documents from sensorsDataHistoric collection, when user updates the group property of the device document.

1 Like

Both worded differently but meaning the same.

2 Likes

I got lost spinning around in the previous thread, sorry.

That’s what I wanted to know: I can’t substitute db.collection.distinct() for distinct, right?

Well, I might tell the DBA about updating the server to 5.0 or even including group in sensorsData, but I guess we have a devices collection for some reason.

Hi @Javier_Blanco ,

the suggestions presented by @NeNaD and @steevej does not meant to eliminate the devices collection but to add needed information to sensorsDataHistoric to allow a query to be based just by it, in your case this “group.id” .

We call this the extended reference pattern when you embed additional reference keys to avoid lookups and joins to speedup queries:

Ty

4 Likes

Hi @Javier_Blanco,

To try to clarify this: distinct is a Database Command. The aggregation framework uses an aggregate database command, which supports Aggregation Pipeline Stages which use Aggregation Pipeline Operators. Aggregation stages and operators are prefixed with $ (for example: $group).

You cannot call the distinct command from within the aggregate command. If you add distinct in an aggregation pipeline, it is just a value:

You can perform the equivalent processing of a distinct command in the aggregation framework using a $group aggregation stage, but I would consider the recommendations you’ve been given such as adjusting the data model to support your common queries or upgrading to MongoDB 5.0+ to take advantage of performance improvements.

Regards,
Stennie

3 Likes

By the way, how could you take off “distinct” from the expression?

Thanks in advance.

When you campare distinct values using a distinct query you assume that the values coming from that query might be duplicated, but since you compare to _id, which is a unique index by definition it is equivalent to just query those _id, no need for distinct…

2 Likes

And how would that expression be?

I need to compare to _id, but to those _id related to group.$id #11.

Simply

db.devices.find( { "group.$id" : 11 } ).projection( { _id : 1 } )

will produce a list of distinct _id (because _id is always unique within a collection as already mentioned) of the devices of the group 11.

2 Likes