MongoDB query: select all documents with latest timestamp

I’m a complete beginner with MongoDB and we use MongoTemplate in our project.

I have a collection with documents in this form:

{
    "_id" : ObjectId('...'), // This is generated automatically by MongoTemplate, I suppose
    "key" : 0,
    "value" : {
        (complex object)
    }
    "timestamp" : 1668584029237
    ...
}

The tricky part for me is that we can have documents with the same “key” that are outdated (can be judged by the “timestamp”).

I’d like to to query all documents in the database, but in the case of documents with the same “key”, then only the one with the latest timestamp (the greatest number) should be returned.

So far I’ve written this:

    List<AggregationOperation> aggregations = new ArrayList<>();

    ...

    aggregations.add(sort(Sort.by(DESC, "timestamp")));
    aggregations.add(group("key").first("timestamp").as("timestamp"));
    
    final List<Map> result = mongoTemplate.aggregate(newAggregation(aggregations), tableName, Map.class).getMappedResults();

I know that I’m very close to the solution, but I cannot manage to get what I want by the group method :frowning: Please help.

Note: I need to return all fields expect “_id”

Hi @Capitano_Giovarco ,

You can use a simple find to query the data for a specific key, sort by timestamp (make sure it is indexed with -1 order) :
Example:

db.collection.find({key : 0}).sort({timestamp : -1}).limit(1)

With index {key : , timestamp : -1} this will be efficient…

Have I not understood you correctly?

Thanks
Pavel

2 Likes

Hi Pavel,

If I understand correctly, your query returns one record for a specific key. However, this isn’t what I need.

I don’t want to specify the key because the query should return all documents with all keys. Maybe I wasn’t clear enough, so I’ll share an example.

If I have a collection with these documents

{ “key” : 0, “timestamp” : 0 }
{ “key” : 0, “timestamp” : 1 }
{ “key” : 0, “timestamp” : 2 }
{ “key” : 1, “timestamp” : 0 }
{ “key” : 1, “timestamp” : 1 }
{ “key” : 1, “timestamp” : 2 }

I want a generic query that returns:

{ “key” : 0, “timestamp” : 2 }
{ “key” : 1, “timestamp” : 2 }

because these documents are the latest in the collection for the corresponding keys. The query should be efficient because we have million of documents and the code should be as flexible as possible because in our code we have also flexible limit, skip, filtering, sorting etc. I just omitted the fluff that wasn’t needed :slight_smile:

So from my understanding I need to use Aggregation to do the above. However I’m no expect and therefore I’m stuck :frowning: Also, MongoTemplate should be used because this is what we have in place.

Hi @Capitano_Giovarco ,

Ok now I understand.

Going the aggregation route in my opinion would not be the optimal way.

The optimal way is to have a label placed on the latest version of the key document and set/unset it as new data is generated.

Then you can use a partial index set only on the latest data to support the query:

{label : 1, timestamp : -1}, {partialFilterExpression: {label: "latest"} }

And then query only {label : “latest”}

Otherwise, you aggregation will need to involve sorting and grouping and pushing into arrays and replacing roots which is expensive.

If you wish to go this route I can help you with a simple aggregation syntax and you will need to find some help with MongoTemplate specifically… I don’t think it should be that hard to port

Thanks
Pavel

Hi Pavel!

Your method is good, but I think that this is not feasable in my case. Long story.

Therefore let’s keep going even with Vanilla MongoDb. I think we can still understand each other!

I wrote this today:

    Aggregation aggregation = newAggregation(
        group("$key")
            .max("$timestamp").as("timestamp")
    );

This returns the key and the largest timestamp which is a good start. But the result does not include all the fields in the original document, in this case “value” is the most important.

I’ve read that I need to use this “$$ROOT” thingy, but it’s still not clear how to use this.

Wait, I just noticed that this returns what I need:

    Aggregation aggregation = newAggregation(
        group("$key")
            .max("$timestamp").as("timestamp")
            .first("$value").as("value")
    );

The only difference is that “key” is called “_id” in the output. Not sure if one can rename it.

Is this an inefficient way of solving the problem?

Hi @Capitano_Giovarco ,

Yes ok.

[{$sort : {timestamp : -1}},
 {$group : {_id : "$key",
                  docs : {$push : "$$ROOT" }}},
 {$replaceRoot : {$first : "$docs"}}]

Does that help?

Thanks

Hi Pavel!

Thanks for the hint. Sorry for the late response. Just got back from holidays to work.

I try to implement your query and eventually let you know :slight_smile:

So, I couldn’t quite reproduce your query unfortunately, but I’ve found something that equally works (and is a lot faster than my previous implementation).

aggregations.add(
            group("$" + Record.getKeyName())
                .max("$" + Record.getTimestampName()).as(Record.getTimestampName())
                .push("$$ROOT").as("docs")
        );
        
        aggregations.add(
            replaceRoot().withValueOf(ObjectOperators.valueOf("$docs").merge())
        );

:slight_smile:

1 Like

Could this be improved somehow?

Hi @Capitano_Giovarco ,

Not sure, maybe you can share what is indexed?

Thanks
Pavel

Nothing is indexed by me :frowning: How do I choose what fields should be indexed and what type these should have?

Sounds like {key : 1 , timestamp : -1} is a good option.

Thanks
Pavel

@Capitano_Giovarco , Hi there, I noticed you said about being new.

I noticed you are struggling to learn in the last 2 weeks. So I am here to suggest something else: MongoDB University.

If you haven’t done so, have a visit to improve your basic understanding (M001,M121) as well as higher management topics. It is from MongoDB and all courses are free. (for its new face and learning paths, use the “explore new university” button).

Hi!

I’m back after a while because we still experience performance struggles.

We tried to add a composite index:

mongoTemplate.indexOps(tableName).ensureIndex(
    new Index().on(Record.getOffsetName(), Direction.DESC).unique()
        .on(Record.getKeyName(), Direction.ASC)
);

but this decreases performance by 4% on average, compared to having a single index on { OFFSET : DESC }.

I can paste an example of aggregation pipeline:

{
   "aggregate":"__collection__",
   "pipeline":[
      {
         "$match":{
            "offset":{
               "$lte":999999 (NOTE: NEEDED FOR SOME BUSINESS LOGIC)
            }
         }
      },
      {
         "$sort":{
            "offset":-1
         }
      },
      {
         "$group":{
            "_id":"$key",
            "offset":{
               "$max":"$offset"
            },
            "docs":{
               "$first":"$$ROOT"
            }
         }
      },
      {
         "$replaceRoot":{
            "newRoot":"$docs"
         }
      },
      {
         "$project":{
            "key":1,
            "value":1,
            "_id":0
         }
      },
      (NOTE: HERE YOU CAN ADD EXTRA SORTING OR FILTERING 
      CONDITIONS DEPENDING ON THE REQUEST)
      {
         "$skip":0
      },
      {
         "$limit":101
      }
   ],
   "allowDiskUse":true
}

I think that most of the time is spend here:

{
         "$group":{
            "_id":"$key",
            "offset":{
               "$max":"$offset"
            },
            "docs":{
               "$first":"$$ROOT"
            }
         }
      },
      {
         "$replaceRoot":{
            "newRoot":"$docs"
         }
      }

This is needed to pick only the latest version of a message before applying filters and other sorting conditions.

I’m wondering if this can be improved somehow :pray: