Data consistency in Mongodb good practice

Hello, I’m pretty new to mongodb. The senario is there are two collections Employee and Lead.

  1. Employee
    "_id" : ObjectId("5e8f9cb7be4a20002cc7539a"),
    "email" :
    "name" : "John Doe",
  1. Lead
   "_id" : ObjectId("6e8f9cb7be4a20002cc7739b"),
    "leadNumber" : "L00023"
    "createdBy" : "John Doe",

Leads are created by employees, and I need to know which one created the lead. The problem is that employee names are updated because they get replaced or can leave the organization. Only the email id is consistent.

When fetching leads, how to get the current employee’s name. Do I have to store ID and execute lookup every time or embed employee details in lead and update all lead records when the employee name gets updated?


I would recommend to store _id of the Employee and then perform the $lookup.

If you want to display the name of the createdBy and you want to optimize the query, you can add one more field createdByName that would store the name of the user who created the lead. In that case, you would not need to perform the $lookup since you will also have that info inside the document. To summarize, you would store both the _id and name of the employee. Downside of this is that you would have to manually update the name in all leads documents when employee name is changed, but since that will probably not happen that often, it’s great optimization. :smiley:


I have two options now,

  1. Store _id of the Employee in Lead, the downside is if I’m fetching 1000 records, 1000 lookups have to be performed.

  2. Embed employee details such as ‘name’ and ‘_id’ on the lead document. The downside is, if the employee name is updated, all lead documents with the employee details have to be updated.

On the second approach, employee names will not be updated often, but when it does, it’s gonna affect a larger chunk of records. My only concern is that, is it okay to execute an update operation even if it is going to affect millions of records, wont there be any performance issue while updating large volume of records.


Second option is better.

It will optimize the reads a lot. So instead of doing $lookup on all the documents for each find query, you will have the data that you need already inside the document.

Yes, you would have to update all leads records when employee change the name, but that probably will not happen that often.

So you will increate performance of the reads and that is important because you will probably do much more reads then writes.


Is this approach still okay if I have multiple collections similar to lead, which are also storing createdByName ? I would be updating multiple collections. I would also have to index createdBy_Id in order to update the records right.