Merge first record into the final result of $or query

Yinhua_Zhao · January 11, 2023, 1:05pm

Hi experts,

I want to have “$or” query to combine multiple clauses. Each clause returns a recordset, and for each record set, only the first record should be returned and merged into final result.

Is it possible to do it in mql?

Best regards,
Jennifer

alexbevi · January 11, 2023, 1:13pm

You can likely do this using the Aggregation Pipeline. If you can share a couple of sample documents and an example of what the expected output would be it would be easier to provide further assistance.

Yinhua_Zhao · January 13, 2023, 1:15pm

Hi ,
Here is the fake mql I need. But it has grammar error.

`db["mydata"].find({
                    $or:[
                        {   "device": "device1",
                            "reader": "x", 
                            "measurement": "temperature", 
                            "SourceTimeUtc":  { "$lte":ISODate("2023-01-11T06:07:47.280Z") },
                            $sort: { SourceTimeUtc:-1},
                            $limit: 1
                        },
                        {   "device": "device1",
                            "reader": "y", 
                            "measurement": "temperature", 
                            "SourceTimeUtc":  { "$lte":ISODate("2023-01-11T06:07:47.280Z") },
                            $sort: { SourceTimeUtc:-1},
                            $limit: 1
                        },
                        {   "device": "device1",
                            "reader": "x", 
                            "measurement": "humidity", 
                            "SourceTimeUtc":  { "$lte":ISODate("2023-01-11T06:07:47.280Z") },
                            $sort: { SourceTimeUtc:-1},
                            $limit: 1
                        }
                    ]
                }
                )`

alexbevi · January 13, 2023, 3:40pm

Hi @Yinhua_Zhao,

It looks like what you’re looking to do is effectively merge the results of 3 different filters into a single result set. As it appears you are targeting a single collection you could do something like the following using $facet.

First, we’re just going to setup a collection (name test.foo) with some sample data and create an index the pipeline can use to more efficiently retrieve the results.

db.foo.drop();
db.foo.insertMany([
  { device: "device1", reader: "x", measurement: "temperature", sourceTimeUtc: ISODate("2023-01-11T06:07:47.280Z") },
  { device: "device1", reader: "x", measurement: "temperature", sourceTimeUtc: ISODate("2023-01-12T06:07:47.280Z") },
  { device: "device2", reader: "x", measurement: "temperature", sourceTimeUtc: ISODate("2023-01-11T06:07:47.280Z") },
  { device: "device1", reader: "y", measurement: "temperature", sourceTimeUtc: ISODate("2023-01-11T06:07:47.280Z") },
  { device: "device1", reader: "y", measurement: "temperature", sourceTimeUtc: ISODate("2023-01-12T06:07:47.280Z") },
  { device: "device2", reader: "y", measurement: "temperature", sourceTimeUtc: ISODate("2023-01-11T06:07:47.280Z") },
  { device: "device1", reader: "x", measurement: "humidity", sourceTimeUtc: ISODate("2023-01-11T06:07:47.280Z") },
  { device: "device1", reader: "x", measurement: "humidity", sourceTimeUtc: ISODate("2023-01-12T06:07:47.280Z") },
  { device: "device2", reader: "x", measurement: "humidity", sourceTimeUtc: ISODate("2023-01-11T06:07:47.280Z") }
]);
db.foo.createIndex({ device: 1, sourceTimeUtc: -1 });

Next we’ll filter the collection for common documents that all filter permutations can use ({ device: "device1", sourceTimeUtc: { $lte: ISODate("2023-01-11T06:07:47.280Z") }). The $facet stage allows you to define 3 new filters that can be applied to the results from the previous stage, which we’ll then combine into a single array (using $setUnion) and return as the result of the pipeline by unwinding the resulting array and replacing the pipeline’s output (using $replaceRoot).

db.foo.aggregate([
  { $match: { device: "device1", sourceTimeUtc: { $lte: ISODate("2023-01-11T06:07:47.280Z") } } },
  { $facet: {
    "results1": [
      { $match: { reader: "x", measurement: "temperature"} },
      { $sort: { sourceTimeUtc: -1 } },
      { $limit: 1 }
    ],
    "results2": [
      { $match: { reader: "y", measurement: "temperature"} },
      { $sort: { sourceTimeUtc: -1 } },
      { $limit: 1 }
    ],
    "results3": [
      { $match: { reader: "x", measurement: "humidity"} },
      { $sort: { sourceTimeUtc: -1 } },
      { $limit: 1 }
    ],
  }},
  { $project: { results: { $setUnion: [ "$results1", "$results2", "$results3" ] } } },
  { $unwind: "$results" },
  { $replaceRoot: { newRoot: "$results" } }
])

For the sample documents above the result should be:

[
  {
    "_id": {
      "$oid": "63c17b47415a20047f425a07"
    },
    "device": "device1",
    "reader": "x",
    "measurement": "temperature",
    "sourceTimeUtc": {
      "$date": "2023-01-11T06:07:47.280Z"
    }
  },
  {
    "_id": {
      "$oid": "63c17b47415a20047f425a0a"
    },
    "device": "device1",
    "reader": "y",
    "measurement": "temperature",
    "sourceTimeUtc": {
      "$date": "2023-01-11T06:07:47.280Z"
    }
  },
  {
    "_id": {
      "$oid": "63c17b47415a20047f425a0d"
    },
    "device": "device1",
    "reader": "x",
    "measurement": "humidity",
    "sourceTimeUtc": {
      "$date": "2023-01-11T06:07:47.280Z"
    }
  }
]

Yinhua_Zhao · January 14, 2023, 10:35am

Hi Alex,

Thank you so much! Now I know the $facet should functionally work.

But I have a concern about the performance. Each device can contains 100, 000 rows to 10,000,000 of data. I just learned that $facet stage don’t use index. Given the query normally query the latest data, there will be too much data flowing into $facet stage because typically the $match stage filter out all the data belong to current device at that time.

In current data model, the index is ({ device: 1, reader: 1, measurement :1, sourceTimeUtc: -1}). So below query is very fast because keys scanned is 1 in the execution plan.

          db["mydata"].find({   "device": "device1",
                            "reader": "x", 
                           "measurement": "temperature", 
                            "SourceTimeUtc":  { "$lte":ISODate("2023-01-11T06:07:47.280Z") 
  }).sort(SourceTimeUtc:-1).limit(1)
                    }

In this new data model, the index is ({ device: 1, sourceTimeUtc: -1}). The keys scanned could be 10,000,000 rows. The performance will be a major concern.

Best regards,

Jennifer Zhao

system · January 19, 2023, 10:35am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.