Some about data modeling

Hi,

I have a collection with around 5M documents. This collection contains the history of a certain balances together with the current balance like this:

{
   _id: ObjectId('12345')
   UserId: 1
   BalanceChanged: "10"
   BalanceCurrent: "10"
   Date: 2022-11-18T01:00:00
   ...
},
{
   _id: ObjectId('12346')
   UserId: 1
   BalanceChanged: 5
   BalanceCurrent: "15"
   Date: 2022-11-18T02:00:00
   ...
}

When I want the current balance of all users, i am facing performance issues:

  • The data must be sorted first (desc) in order to get the most recent balance (takes around 2 seconds with a desc index on “Date”);
  • The data must be grouped by “UserId” and get the first document (another 2 seconds);
  • Replacing root in order to get the inner document of the group (another 2 seconds);

I was wondering if I could do this another way where I can get the current balance of all users in lets say a few milliseconds?

I am aware that I can create a inner collection like this:

{
   _id: ObjectId('12345')
   UserId: 1
   BalanceCurrent: "15"
   History: [
      {
         BalanceChanged: "10"
        Date: 2022-11-18T01:00:00
      },
      {
         BalanceChanged: 5
         Date: 2022-11-18T02:00:00
         ...
      },
   ]
}

But I’m worried about running into limits (max document size for instance). Some users already have 50K documents in less then 3 months.

Or is there another way to query this kind of data in a few milliseconds?

Thanks!

The above looks huge but it may be normal. What is the total size of your databases? What are the specs of your installation? How did you obtain the 2secs? Can you provide the explain plan? What other indexes do you have? How many unique UserId?

About being

You could use the bucket pattern to store monthly balance history. So you would go midway from having one history document per balance change to one document per month.

Something like:

{
   _id: ObjectId('12345')
   UserId: 1 ,
   BalanceCurrent: "15" ,
   Month : 2022-11-01T00:00:00
   History: [
      {
         BalanceChanged: "10"
        Date: 2022-11-18T01:00:00
      },
      {
         BalanceChanged: 5
         Date: 2022-11-18T02:00:00
      },
   ]
}

You could also have 2 collections; CurrentBalance and BalanceHistory. It is not because we can keep things together that we have to. The motto is to keep together things that are access together, but I see getting my current balance as a different use-case from getting the history.