Hi @martin_daniels ,
To me it sounds like the outlier pattern with story documents is the way to go.
The idea is that you have each story as a document in the story collection, with its data and embedded array of users that viewed the post :
{
_id : "doc1",
storyId : 'xxx' ,
storyCreateDate : ISODate(...),
S3Url : " .... ",
users : [ { "userId" : "embeeded1", "avatar" : "..." , dateViewed : ... } ... { "id" : "embeededN" } ],
overFlowIndex: 1,
totalViewed : 300,
hasOverflow : true
}
...
{
_id : "doc2",
storyId : 'xxx' ,
storyCreateDate : ISODate(...),
S3Url : " .... ",
users : [ { "userId" : "embeededN+1", "avatar" : "..." , dateViewed : ... } ... { "id" : "embeededN+M" } ] ,
overFlowIndex: 2,
hasOverflow : false
}
When a specific post gets more than “N” number of distinct views (lets say ~200) we open a new document and paging those viewers. You can index {storyId : 1, "users.userId" : 1}
to use it for a query to determine whether a user has already viewed that story or is it a new user…
Now there will be a TTL index on the “storyCreateDate” and therefore all the documents will be deleted at the same cycle. Having said that the total views for a story will be calculated from a sum of views on each document or maintained in the “overFlowIndex : 1” document and incremented every update.
Now to get all the documents of story xxx
I need to query:
db.collection.find({"storyId" : "xxx" , overFlowIndex : { $gt : 0} }
If you need to sort the documents based on insert order:
db.collection.find({"storyId" : "xxx" , overFlowIndex : { $gt : 0} }.sort({ overFlowIndex : 1})
Now when indexing {"storyId" : 1, "overFlowIndex" : 1}
you will get an indexed query to get all overflow documents.
Regarding the trigger to delete the S3 files there are 2 solutions:
- The triggers now have a pre image feature that will allow you to get the document right before the deletion:
- The S3 storage should have a retention policy that you can tune and you can decuple the S3 and MongoDB document deletion.
Thanks
Pavel