Aggregation is slow

solo_dios · December 23, 2021, 12:01pm

Hi.
I working on mongodb database with a collection contains 12 million record of users payments.
scheme of document in this collection :

{
  "_id": ObjectId(),
  "amount": 0,
  "sharing_plan": 30,
  "start_time": 0,
  "end_time": 0,
  "pay_code": "",
  "user_id": "",
  "date" : {
    "year" : "2021",
    "month" : "01",
    "day" : "01"
  }
}

I have to analyze this data annually, monthly and daily (for example, how much we earned in each month of the year).
I have to use $ group here.
But it is very slow and takes up to 15 seconds.
my aggregation :

database.collection("payments").aggregate([
                    {
                        $match: {
                            $and: [
                                {
                                    "date.year": "2021"
                                },
                                {
                                    "date.month": "01"
                                }
                            ]
                        }
                    },
                    {
                        $group: {
                            "_id": "$date.month",
                            "sumAmount": {"$sum": "$amount"}
                        }
                    },
                    {
                        $project: {
                            "_id": 0,
                            "month": "$_id",
                            "sumAmount": "$sumAmount"
                        }
                    }
                ])

how to speed up this process ??
please help.

solo_dios · December 23, 2021, 12:17pm

It also takes 15 seconds when I want to get the sum of the whole amount

solo_dios · December 23, 2021, 1:28pm

Do I have to switch to mysql ??

steevej · December 23, 2021, 1:59pm

You are a little bit impatient.

Storing dates as an object with 3 strings is one of the most wasteful space-wise and slow way to do it. SQL or NoSQL.

So the first thing to do is to fix your model to properly use an appropriate data type.

What indexes do you have? Indexes are important SQL or noSQL.

You do not need explicit $and in your $match.

solo_dios · December 23, 2021, 2:46pm

So how do I make a model so I can get the data?
You guide.
i using indexes (amonut : 1) (date.year : 1) & …
please help
thanks

steevej · December 23, 2021, 2:53pm

A compound index that starts with amount is surely useless for most use case. It is for this one for sure. You $match on date.year and date.month. That is the index you need. Your date should be a date not an object of 2 strings. See https://docs.mongodb.com/manual/reference/bson-types/#date

solo_dios · December 23, 2021, 5:31pm

How to sum total 12 million record amonut faster?

steevej · December 23, 2021, 7:55pm

What is the total size (in MB) of your collection?

What is the RAM in your machine?

What kind of disks?

To sum all amounts an index on amount is sufficient. You might have to $project amount first. To sum the amount by date an compound index date,amount may also be helped with a $project after $match.

solo_dios · December 23, 2021, 8:04pm

collection size is 2.7Gb.
machine ram → 32Gb ddr4 ecc 2700
hard drive → ssd nvme 512Gb.
The only problem I have with mongodb is that if this problem is solved it is the best database I have worked with.
Thank you

steevej · December 24, 2021, 1:30pm

I see nothing wrong with the specs. Hope these are the specs of the server on which you run mongod. Do you run anything else on this? Can you share your mongod configuration file? I want to make sure you do not play with some parameters that restrict the storage engine.

Have you replaced your date object with a single native date field?

Have you played with the indexes I proposed?

If your application creates transactions only in the current month, it is worth considering using https://www.mongodb.com/blog/post/building-with-patterns-the-computed-pattern.

I do not think anybody can answer that. What I can say is that it is my preferred. So much, that as a independent old and grumpy veteran contractor, I now refuse contract that involve SQL because I have more fun with MongoDB. It helps me solve a bigger variety of problems with less planning, coding and maintenance work.

solo_dios · December 24, 2021, 2:30pm

Please wait until I send you the config file tomorrow.

solo_dios · December 25, 2021, 7:24am

# mongod.conf

# for documentation of all options, see:
#   http://docs.mongodb.org/manual/reference/configuration-options/

# Where and how to store data.
storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true
#  engine:
#  wiredTiger:

# where to write logging data.
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log

# network interfaces
net:
  port: 27017
  bindIp: 127.0.0.1


# how the process runs
processManagement:
  timeZoneInfo: /usr/share/zoneinfo

security:
  authorization: "enabled"
  keyFile: /home/serverOne/Documents/AuthFile/key

#operationProfiling:

replication:
   replSetName: "rs0"

#sharding:

## Enterprise-Only Options:

#auditLog:

#snmp:

steevej · December 26, 2021, 2:47pm

I see nothing that would stop the storage engine to use your RAM correctly.

One thing to remember is that if you sum your 12M documents with a cold system, the 12M documents needs to be read from disk which may impact performance. You should always evaluate performance on a warm running server where the working set is in RAM.

solo_dios · December 26, 2021, 6:49pm

There is no way to $sum other than $group ?

steevej · December 26, 2021, 8:01pm

I do not think there is.

What is wrong with $group and $sum?

solo_dios · December 27, 2021, 8:48am

The problem is that it does not use index.

steevej · December 27, 2021, 1:27pm

It should. But it needs to be the correct one. See

and