Join multiple document types exists in single MongoDB Collection / Container

Please consider following data model where it is designed to put multiple types of documents into single sharded MongoDB collection or container.

{
    "id": "<post-id>",
    "type": "post",
    "postId": "<post-id>",
    "userId": "<post-author-id>",
    "title": "<post-title>",
    "content": "<post-content>",
    "creationDate": "<post-creation-date>"
}

{
    "id": "<comment-id>",
    "type": "comment",
    "postId": "<post-id>",
    "userId": "<comment-author-id>",
    "content": "<comment-content>",
    "creationDate": "<comment-creation-date>"
}

{
    "id": "<like-id>",
    "type": "like",
    "postId": "<post-id>",
    "userId": "<liker-id>",
    "creationDate": "<like-creation-date>"
}

We have requirement to combine / join these documents types to provide following output.

Example Input -

{ 
"id": "post-id", 
"type": "post", 
"postId": "post-id", 
"userId": "post-author-id-1", 
"title": "post-title", 
"content": "post-content"
} 

{ 
"id": "comment-id-1", 
"type": "comment", 
"postId": "post-id",
 "userId": "comment-author-id-1", 
"content": "comment-content-1"
} 

{ 
"id": "comment-id-2", 
"type": "comment", 
"postId": "post-id", 
"userId": "comment-author-id-2", 
"content": "comment-content-2"
} 

{ 
"id": "comment-id-3",
 "type": "comment", 
"postId": "post-id", 
"userId": "comment-author-id-3", 
"content": "comment-content-"
} 

Example Output -

{ 
"id": "post-id", 
"type": "post", 
"postId": "post-id", 
"userId": "post-author-id-1", 
"title": "post-title", 
"content": "post-content"
"comments": [
{
"id": "comment-id-1", ….
} ,
{
"id": "comment-id-2", ….
},
{
"id": "comment-id-3", ….
}
],
"totalCommentCount": 3
} 

Is it possible to do above transformation efficiently in single query or aggregation pipeline in MongoDB? I would also like to compare single container having different types and having separate container for each type data modeling, What are the pros and cons, will there be scalability concerns in Sharding?

Hi @Kevin_Ratnasekera ,

In general we recommend that data which is queried together should be stored together to avoid those joins that you are looking to do.

In your case it sounds like there can be a single post document with a comments array storing objects of user and content:

comments : [ { user-id : ... , content : ... } ... ]

Now if you still require the current data model you will need to do a self join using aggregation $lookup stage if all documents are in the same collection. Or to a different collection if all comments will be stored there.

In case of sharding we need to remember that the from collection joined cannot be sharded, therefore it will not allow you to use joins if you shard the collection.

For those reasons the best solution is to redesign the data to sit as a complex objects rather than forming joins and relationships between documents (more for relational databases) .

Please read the following :

Thanks
Pavel

1 Like