Aggregation and Lookup

Hi - I need some opinion for my requirement, I have simple case of creating posts, and related comments and likes. For posts I want to build really fast news feed for the users for a particular community. I have following options.

  1. I can add all the comments and likes into a single post document. But I fear that might limit based on the max document size? or I should go for 3 collections, one each for post, comments and likes.

  2. Should I use MongoDB atlas search pipeline to aggregate results at runtime or I should prepare these counts for total likes and total comments through some other process/mechanism? My fear is the query speed on such aggregates for each post can be concerning, since it might redo aggregation every-time it runs from the scratch.

  3. Should I build a separate process to merge aggregate result like this:
    https://docs.mongodb.com/manual/core/materialized-views/

Can you tell me best ways to accomplish this?

Hi @Tariq_Jawed,

Considering the described use case and your consideration I would suggest having 2-3 collections:

  1. Posts which will have the post content and will keep track of the likes (list or count) and number of comments + a reference to the comments or comment document.
  2. Comments which can have a comment per document or a Document per post.
  3. Optional. Like description which will hold an array of users who liked a specific post this is if you need to list who liked a post once you click on it.

Let me know if you have any further questions

pavel

1 Like

Thank you very much for your reply, we have create a POST collection for post contents, then we have an PostActivity collection which holds both comments and likes. I am trying to create an aggregate view to total comments and likes for the post; by grouping and then doing merge on it. See the example below
https://docs.mongodb.com/manual/core/materialized-views/

we are following similar logic to create the aggregation, but the problem we are having is that first time it aggregates fine, but when we try to run it periodically using scheduler, then as shown above in the document, it should only take new records and update the old one, the above example shows exactly that, it add up the new record sum in the aggregate view rather than just replacing it with new sum for existing grouping. For some reason, when I do this kind of grouping on PostId, this method doesn’t add to the result rather replaces it completely with only newly found group sum. But when I use the data wise grouping as shown in the example above, then it works as expected. Here is my function which is doing aggregation:

exports = async function() {
  const mongodb = context.services.get("Cluster0");
  const collection = mongodb.db("test").collection("PostActivity");
  
  // getting last run datetime
  var lastDate = await context.functions.execute("getPostAggregateHistory");
  var currDate = new Date();


  console.log(`executing aggregatePost with LastRunDateTime: ${lastDate.LastRunDateTime}.`);
  
  return collection.aggregate( [
  { 
    $match: { 
      DateTime: { 
       $gte: lastDate.LastRunDateTime 
     } 
    } 
  },
  { 
    $group: { 
      _id: "$PostId",
      likesCount : { $sum: "$Like" },
      commentsCount : { $sum: "$Comment"} 
    } 
  },
  { 
    $merge: { 
      into: "PostAggregate",  
      whenMatched: "replace"
    } 
  }
  ] ).toArray().then(result => {
      console.log(`Successfully ran post-aggregate pipeline`)

      // update datetime for next run
      context.functions.execute("updatePostAggregateHistory", currDate)
      return result
    }).catch(err => console.error(`Failed to run post-aggregate pipeline: ${err}`))
};

just one correction, date wise grouping aggregates the results on subsequent incremental runs fine as document shows,
https://docs.mongodb.com/manual/core/materialized-views/

but PostId based grouping replaces the existing record with only the new sum, it doesn’t add up to existing group (if they exist). I also tried merge instead of replace in merge step. The result is same.

Hi @Tariq_Jawed,

The $merge command will update the document with the calculated number during the current run. It doesn’t know how to add a number to a merge document. Also you do not have an on clause and a whenNotMatched why?

This is the reason I am not in favor of this design when you seperate the information and run all over to calculate a consistent view.

Why won’t you use a trigger on post and poatActivity catching updates/inserts and doing a targeted $inc to the counts. Then you can keep counts in the post collection document as they clearly do not need to be in this other collection which is a complete antipattern to MongoDB.

Best
Pavel

1 Like

Yes I understand your solution and point; the only thing I am concerned is for each like or comment, you would have to apply trigger, and then update the post collection. Which seems to be 3 operations for each activity type and operation type like delete, update and create like and comments.

I am thinking here maybe to schedule a job similar to above, but run it only for distinct post-ids which are changed after the lastRunTime of aggregate job.

exports = async function() {
  const mongodb = context.services.get("Cluster0");
  const collection = mongodb.db("test").collection("PostActivity");
  
  var postIds = await context.functions.execute("GetPostIdsByGreaterThenCurrentDate");
  var currDate = new Date();
  
  return collection.aggregate( [
  { 
    $match: { 
      PostId : { $in : postIds } 
    } 
  },
  { 
    $group: { 
      _id: "$PostId",
      likesCount : { $sum: "$Like" },
      commentsCount : { $sum: "$Comment"} 
    } 
  },
  { 
    $merge: { 
      into: "PostAggregate",  
      whenMatched: "replace"
    } 
  }
  ] ).toArray().then(result => {
      console.log(`Successfully ran post-aggregate pipeline`)
      context.functions.execute("updatePostAggregateHistory", currDate)
      return result
    }).catch(err => console.error(`Failed to run post-aggregate pipeline: ${err}`))
};

I think both approaches have their cost, which one do you think would be best for large traffic site?

Regards,

Hi @Tariq_Jawed,

I am in favour of my first solution :slight_smile:

Where you will add a comment id array in the post document and maintain the counters there.

Evrytime a new comment or a like is added you add it in the comment collection and run another update to $inc the counters in the post. You can execute both in a bulk set.

Best
Pavel

this makes sense, will it work for parallel requests trying to update same post? since many comments or likes could happen at the same time for popular posts, I am afraid it could lead to locking/consistency issues on one document?

Hi @Tariq_Jawed,

I believe that since $inc is a quick command and the lock is based on a post document I don’t expect the locking be impacting.

How many people will create comments and likes during the same 50-100 ms for a single post I expect that a low number

Of course index the filter expression and use unique indexes for fields like postId…

Best
Pavel