How to get one document of each when multiple exists?

I’m having some trouble understanding how to query a specific collection. The collection looks something like:

[{
  "_id": {
    "$oid": "625737ddbbd1522b17c6e1f8"
  },
  "server": "LS-FB1-P",
  "check": "device offline",
  "status": "Passed",
  "comment": "No comment",
  "timestamp": "2022-04-13 13:55",
  "system": "Skynet"
},{
  "_id": {
    "$oid": "6261bf6a27866b323667e569"
  },
  "server": "LS-FB2-P",
  "check": "device offline",
  "status": "Failed",
  "comment": "No comment",
  "timestamp": "2022-04-13 14:55",
  "system": "Skynet"
},{
  "_id": {
    "$oid": "6261bf7427866b323667e56a"
  },
  "server": "LS-FB11-P",
  "check": "device offline",
  "status": "Passed",
  "comment": "No comment",
  "timestamp": "2022-04-13 14:55",
  "system": "Skynet"
},{
  "_id": {
    "$oid": "6261c18127866b323667e56b"
  },
  "server": "LS-FB1-P",
  "check": "server offline",
  "status": "Passed",
  "comment": "No comment",
  "timestamp": "2022-04-12 11:55",
  "system": "Skynet"
},{
  "_id": {
    "$oid": "6261c18a27866b323667e56c"
  },
  "server": "LS-FB2-P",
  "check": "server offline",
  "status": "Passed",
  "comment": "No comment",
  "timestamp": "2022-04-13 14:55",
  "system": "Skynet"
},{
  "_id": {
    "$oid": "6261c19127866b323667e56d"
  },
  "server": "LS-FB11-P",
  "check": "server offline",
  "status": "Passed",
  "comment": "No comment",
  "timestamp": "2022-04-13 14:55",
  "system": "Skynet"
},{
  "_id": {
    "$oid": "62685c7f27866b323667e5e6"
  },
  "server": "LS-FB1-P",
  "check": "device offline",
  "status": "Passed",
  "comment": "No comment",
  "timestamp": "2022-04-14 13:55",
  "system": "Skynet"
}]

This is to simulate some kind of logging where different checks are performed on different servers at some point in time. As you see the same server might appear multiple times for the same check, as the check can happen multiple times during the same day or a number of times over a number of days.

I want to query this data so that I only get all the servers and the different checks, but only the latest one for each server-check combination. I plan to use this data to create a matrix of servers and checks and show a pass/fail text dependant on the latest check results for each server-check combination.

Is there a way to make such a query in one go based on the structure of the collection?

Try, this untested attempt:

sort = { "$sort" : { "timestamp" : -1 } }

group = { "$group" :
  { "_id" : "$server" ,
  // latest uses $first because we sort timestamp:-1
  // which may or may not be faster than sort timestamp:1 with $last
  { "latest" : { "$first"  : "$$ROOT" } }
}

pipeline = [ sort , group ]

Your timestamp should not be string but real date/timestamp.

  1. Strings takes more space
  2. Comparing the string “2022-04-13 14:55” to the string “2022-04-13 15:55” is much slower than comparing the same when store as date/timestamp.

Rather than comment:“No comment”, I would leave the field out or use null. Much more efficient space wise, in particular if partial index is used.

I would also optimize status:Passed, which hopefully is the most frequent, by leaving it out and have error:Blabla for status that are not Passed. So if no error field, status is Passed otherwise it is the error.

Updated and tested.

// same as before
sort = { "$sort" : { "timestamp" : -1 } }

// braces corrected
group = { "$group" :
  { "_id" : "$server" ,
    "latest" : { "$first"  : "$$ROOT" }
  }
}

// added so that resulting documents have same structure as the original
replace_root = { "$replaceRoot" : { "newRoot" : "$latest" } }

pipeline = [ sort , group , replace_root ]

// result set when aggregated on original documents (excepts for _id)
{ _id: ObjectId("62689d901c470497296ee9f0"),
  server: 'LS-FB1-P',
  check: 'device offline',
  status: 'Passed',
  comment: 'No comment',
  timestamp: '2022-04-14 13:55',
  system: 'Skynet' }
{ _id: ObjectId("62689d901c470497296ee9eb"),
  server: 'LS-FB2-P',
  check: 'device offline',
  status: 'Failed',
  comment: 'No comment',
  timestamp: '2022-04-13 14:55',
  system: 'Skynet' }
{ _id: ObjectId("62689d901c470497296ee9ec"),
  server: 'LS-FB11-P',
  check: 'device offline',
  status: 'Passed',
  comment: 'No comment',
  timestamp: '2022-04-13 14:55',
  system: 'Skynet' }

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.