We have more than 25 core data in a collection, i am performing groupby aggregate operation, but it taking more than 24hrs to run a single aggregate

We have more than 25 core data in a collection, i am performing groupby aggregate operation, but it taking more than 24hrs to run a single aggregate.

please tell me any solution to make faster groupby operation with large data.

Please post sample documents with the aggregation you tried.

The words

do not help us very much to determine the size of your collection. How many documents? What is the average size? What are you indexes? What is your hardware configuration?

Thanks For Reply, Please help me any Suggestions

I am Using Below Aggregate, This is Example

db.CollectionName.aggregate([
    {$match:{$expr:{$eq:['$exchange','bse']}}},
    {$lookup:{ from:'CollectionName', localField:'code', foreignField:'code', as:'j' }},
    { $unwind:'$j' },
    {$match:{$expr:{$eq:['$exchange','$j.exchange']}}},
    {$group:{
        _id:'$code',
        "data":{"$addToSet":'$close'}
    }},
//      {$limit:1000}
])

size of your collection: 7.3 GB
How many documents?: 18 Documents
Count: 256683221
Table Index: Two Indexex with “date”: -1 , “code”: 1
h/w config: 8Gb RAM, intel(r) xeon(r) cpu e3-1220 v6 @ 3.00ghz, 400GB HDD Free Space

in above example please ignore other things, the only groupby is giving me problem, it iwll takes morethan a day to execute.

We need sample documents that we can import in our environment in order to help you.

can be simplified to

{ "$match" : { "exchange" , "bse" } }

Rather than

you could use an alternate form of $lookup that would limit the looked up documents to the correct exchange in order to avoid the stage

1 Like

you could use an alternate form of $lookup that would limit the looked up documents to the correct exchange in order to avoid the stage

about above , can you please give me example

The following example is taken from https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/.

db.orders.aggregate( [
   {
      $lookup: {
         from: "restaurants",
         localField: "restaurant_name",
         foreignField: "name",
         let: { orders_drink: "$drink" },
         pipeline: [ {
            $match: {
               $expr: { $in: [ "$$orders_drink", "$beverages" ] }
            }
         } ],
         as: "matches"
      }
   }
] )

So you move your
`

within the $lookup as to limit the number of documents in j.

We still lack sample input documents, so I will assume that close comes from CollectionName. If that is the case then you could add a $project within the $lookup pipeline so that you have less data in j.

With a collection size of 7.3Gb and RAM of 8Gb you only half of the potential working set in memory. I would say your setup is undersized.