Merge first record into the final result of $or query

Hi experts,

I want to have “$or” query to combine multiple clauses. Each clause returns a recordset, and for each record set, only the first record should be returned and merged into final result.

Is it possible to do it in mql?

Best regards,
Jennifer

You can likely do this using the Aggregation Pipeline. If you can share a couple of sample documents and an example of what the expected output would be it would be easier to provide further assistance.

Hi ,
Here is the fake mql I need. But it has grammar error.

`db["mydata"].find({
                    $or:[
                        {   "device": "device1",
                            "reader": "x", 
                            "measurement": "temperature", 
                            "SourceTimeUtc":  { "$lte":ISODate("2023-01-11T06:07:47.280Z") },
                            $sort: { SourceTimeUtc:-1},
                            $limit: 1
                        },
                        {   "device": "device1",
                            "reader": "y", 
                            "measurement": "temperature", 
                            "SourceTimeUtc":  { "$lte":ISODate("2023-01-11T06:07:47.280Z") },
                            $sort: { SourceTimeUtc:-1},
                            $limit: 1
                        },
                        {   "device": "device1",
                            "reader": "x", 
                            "measurement": "humidity", 
                            "SourceTimeUtc":  { "$lte":ISODate("2023-01-11T06:07:47.280Z") },
                            $sort: { SourceTimeUtc:-1},
                            $limit: 1
                        }
                    ]
                }
                )`

Hi @Yinhua_Zhao,

It looks like what you’re looking to do is effectively merge the results of 3 different filters into a single result set. As it appears you are targeting a single collection you could do something like the following using $facet.

First, we’re just going to setup a collection (name test.foo) with some sample data and create an index the pipeline can use to more efficiently retrieve the results.

db.foo.drop();
db.foo.insertMany([
  { device: "device1", reader: "x", measurement: "temperature", sourceTimeUtc: ISODate("2023-01-11T06:07:47.280Z") },
  { device: "device1", reader: "x", measurement: "temperature", sourceTimeUtc: ISODate("2023-01-12T06:07:47.280Z") },
  { device: "device2", reader: "x", measurement: "temperature", sourceTimeUtc: ISODate("2023-01-11T06:07:47.280Z") },
  { device: "device1", reader: "y", measurement: "temperature", sourceTimeUtc: ISODate("2023-01-11T06:07:47.280Z") },
  { device: "device1", reader: "y", measurement: "temperature", sourceTimeUtc: ISODate("2023-01-12T06:07:47.280Z") },
  { device: "device2", reader: "y", measurement: "temperature", sourceTimeUtc: ISODate("2023-01-11T06:07:47.280Z") },
  { device: "device1", reader: "x", measurement: "humidity", sourceTimeUtc: ISODate("2023-01-11T06:07:47.280Z") },
  { device: "device1", reader: "x", measurement: "humidity", sourceTimeUtc: ISODate("2023-01-12T06:07:47.280Z") },
  { device: "device2", reader: "x", measurement: "humidity", sourceTimeUtc: ISODate("2023-01-11T06:07:47.280Z") }
]);
db.foo.createIndex({ device: 1, sourceTimeUtc: -1 });

Next we’ll filter the collection for common documents that all filter permutations can use ({ device: "device1", sourceTimeUtc: { $lte: ISODate("2023-01-11T06:07:47.280Z") }). The $facet stage allows you to define 3 new filters that can be applied to the results from the previous stage, which we’ll then combine into a single array (using $setUnion) and return as the result of the pipeline by unwinding the resulting array and replacing the pipeline’s output (using $replaceRoot).

db.foo.aggregate([
  { $match: { device: "device1", sourceTimeUtc: { $lte: ISODate("2023-01-11T06:07:47.280Z") } } },
  { $facet: {
    "results1": [
      { $match: { reader: "x", measurement: "temperature"} },
      { $sort: { sourceTimeUtc: -1 } },
      { $limit: 1 }
    ],
    "results2": [
      { $match: { reader: "y", measurement: "temperature"} },
      { $sort: { sourceTimeUtc: -1 } },
      { $limit: 1 }
    ],
    "results3": [
      { $match: { reader: "x", measurement: "humidity"} },
      { $sort: { sourceTimeUtc: -1 } },
      { $limit: 1 }
    ],
  }},
  { $project: { results: { $setUnion: [ "$results1", "$results2", "$results3" ] } } },
  { $unwind: "$results" },
  { $replaceRoot: { newRoot: "$results" } }
])

For the sample documents above the result should be:

[
  {
    "_id": {
      "$oid": "63c17b47415a20047f425a07"
    },
    "device": "device1",
    "reader": "x",
    "measurement": "temperature",
    "sourceTimeUtc": {
      "$date": "2023-01-11T06:07:47.280Z"
    }
  },
  {
    "_id": {
      "$oid": "63c17b47415a20047f425a0a"
    },
    "device": "device1",
    "reader": "y",
    "measurement": "temperature",
    "sourceTimeUtc": {
      "$date": "2023-01-11T06:07:47.280Z"
    }
  },
  {
    "_id": {
      "$oid": "63c17b47415a20047f425a0d"
    },
    "device": "device1",
    "reader": "x",
    "measurement": "humidity",
    "sourceTimeUtc": {
      "$date": "2023-01-11T06:07:47.280Z"
    }
  }
]
1 Like

Hi Alex,

Thank you so much! Now I know the $facet should functionally work.

But I have a concern about the performance. Each device can contains 100, 000 rows to 10,000,000 of data. I just learned that $facet stage don’t use index. Given the query normally query the latest data, there will be too much data flowing into $facet stage because typically the $match stage filter out all the data belong to current device at that time.

In current data model, the index is ({ device: 1, reader: 1, measurement :1, sourceTimeUtc: -1}). So below query is very fast because keys scanned is 1 in the execution plan.

          db["mydata"].find({   "device": "device1",
                            "reader": "x", 
                           "measurement": "temperature", 
                            "SourceTimeUtc":  { "$lte":ISODate("2023-01-11T06:07:47.280Z") 
  }).sort(SourceTimeUtc:-1).limit(1)
                    }

In this new data model, the index is ({ device: 1, sourceTimeUtc: -1}). The keys scanned could be 10,000,000 rows. The performance will be a major concern.

Best regards,

Jennifer Zhao

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.