Grouping By Sort and Limit

Hello,

I don’t seem to be able to figure out how to construct an aggregation to group by, then sort and then take one record from each group which is the latest added one.
Here is what my collection looks like:

{
	"thresholds": [
		{"name": "A", "violations": "0", "notes": "No violations detected"},
		{"name": "B", "violations": "2", "notes": "Some info on detected violations goes here"}, 
		{"name": "C", "violations": "1", "notes": "Some info on detected violations goes here"},
	]
}

Then 3 more records are added and I get this:

{
	"thresholds": [
		{"name": "A", "violations": "0", "notes": "No violations detected"},
		{"name": "B", "violations": "2", "notes": "Some info on detected violations goes here"}, 
		{"name": "C", "violations": "1", "notes": "Some info on detected violations goes here"},
                {"name": "A", "violations": "4", "notes": "Some details"},
		{"name": "B", "violations": "3", "notes": "Some info on detected violations goes here"}, 
		{"name": "C", "violations": "0", "notes": "All good"},
	]
}

What I need the aggregation to return is the most recent added record for each group; i.e. “name”

Thanks in advance.

Publish the expected results.

Publish what you tried and indicate how it fails to provide the expected results.

This will prevent us from offering a solution that does not produce the expected results and to pursue avenue that you already know that fails.

At first glance the solution provably involves $unwind and $last as the $group accumulator.

1 Like

Here is what I have in the collection

{
“thresholds”: [
{“name”: “A”, “violations”: “0”, “notes”: “Set 1 record”},
{“name”: “B”, “violations”: “0”, “notes”: “Set 1 record”},
{“name”: “C”, “violations”: “0”, “notes”: “Set 1 record”},
{“name”: “A”, “violations”: “0”, “notes”: “Set 2 record”},
{“name”: “B”, “violations”: “1”, “notes”: “Set 2 record”},
{“name”: “C”, “violations”: “0”, “notes”: “Set 2 record”},
{“name”: “A”, “violations”: “2”, “notes”: “Set 3 record”},
{“name”: “B”, “violations”: “0”, “notes”: “Set 3 record”},
{“name”: “C”, “violations”: “0”, “notes”: “Set 3 record”},
]
}

I need a query that would group the records by “name” (A, B,C) and return a record from each group that was added last.

Specifically, the query would return the following 3 record:

"thresholds": [
	{"name": "A", "violations": "2", "notes": "Set 3 record"},
	{"name": "B", "violations": "0", "notes": "Set 3 record"}, 
	{"name": "C", "violations": "0", "notes": "Set 3 record"}
]

}

I hope this explains the ask. Thanks.

What is not clear is the structure of your documents.

Is thresholds the name of your collection and each the rows are top level documents,
or
Is thresholds the field name of an array within a top level document that contains the rows as object.

You are right, the data is not accurate and that “threshold” key should not be there.
The collection has the following:

{“name”: “A”, “violations”: “0”, “notes”: “Set 1 record”},
{“name”: “B”, “violations”: “0”, “notes”: “Set 1 record”},
{“name”: “C”, “violations”: “0”, “notes”: “Set 1 record”},
{“name”: “A”, “violations”: “0”, “notes”: “Set 2 record”},
{“name”: “B”, “violations”: “1”, “notes”: “Set 2 record”},
{“name”: “C”, “violations”: “0”, “notes”: “Set 2 record”},
{“name”: “A”, “violations”: “2”, “notes”: “Set 3 record”},
{“name”: “B”, “violations”: “0”, “notes”: “Set 3 record”},
{“name”: “C”, “violations”: “0”, “notes”: “Set 3 record”}

And the end result of the query would be
{“name”: “A”, “violations”: “2”, “notes”: “Set 3 record”},
{“name”: “B”, “violations”: “0”, “notes”: “Set 3 record”},
{“name”: “C”, “violations”: “0”, “notes”: “Set 3 record”}

Basically the latest inserted record from each group given that we Group By “name” field.

Thanks.

You will not be able to that without a timestamp field.

Here is a solution that I would not trust because it uses the ObjectId _id field.

sort = { "$sort" : { "_id" : -1 } }

group = { "$group" :
  { "_id" : "$name" ,
    "latest" : { "$first"  : "$$ROOT" }
  }
}

replace_root = { "$replaceRoot" : { "newRoot" : "$latest" } }

pipeline = [ sort , group , replace_root ]

This is the same solution as the following that uses the more reliable timestamp field.

Appreciate your help @steevej . One question though - is there a way to run these as a single command ? So far I was running find() commands from Mongo shell, or from a Python code using pymongo.
I’ll see what Google will have to say about that too

Thanks again!

It is one command.

See https://www.mongodb.com/docs/manual/reference/method/db.collection.aggregate/

okay, so it all goes into db.collection.aggregate(). Got it. Thanks

1 Like