How to improve the performance on lookup & group Aggregation?

I am using compass v1.36.0 . most of the time getting errors like “Operation exceeded the time limit. Please try increasing the maxTimeMS for the query in the expanded filter options.”

Actually, I have three collection

studenteachers collection

"user_id": "313031393137303000",
"studen_Ind": "Y",
"client_Id": "LA0XDA",
"clientName": "LA0XDA",
"country_id": "CRYVN",
"country_name": "VIETNAM",
"userClient_id": "101908617",
"user_status": "A",
"geo": "ASEAN",
"programName": "X",
"site_id": "W26872",
"site_name": "VNXG02",
"state_id": "VN_REGION",
"subprogram_name": "LAXSub",
"supervisor_email": "",
"supervisor_id": "101483574",
"teacher_IND": "N"

in the studentteacher collection

  • we have more than 4laks records,
  • student, teachers, and their respective site and client details
  • along with we have user_status(A- active , I- inactive) details
  • We can group the student or teachers client-wise / site-wise/geo-wise/country-wise.

AccessSettings Collection


in the AccessSettings collection, we have account and site details [ status-: A - mean Active clients under the site, status-I mean inactive clients under the site…]
We can get all the access enabled client and their site details from the Access settings collection.

Attendance Collection

"_id": "64071b221078082370f5c4e8",
"attendance_date": "2023-03-07T00:00:00.000Z",
"user_id": "313031383337333000",
"geo": "ASEAN",
"programName": "X",
"site_id": "W26872",
"site_name": "VNXG02",
"state_id": "VN_REGION",
"studen_Ind": "Y",
"client_Id": "LA0XDA",
"clientName": "LA0XDA",
"country_id": "CRYVN",
"country_name": "VIETNAM",
"userClient_id": "101908617"

we can expect the minimum record count for this collection is more than 50lakhs per month.

we need the output like below

{Date :
client_name :
geo :
country :
site :
ActiveStudentCount :
ActiveTeacherCount :
StudentAttendanceCount :
TeacherAttendanceCount :

Query :

  {    $group: {
      _id: {
        client: "$clientName",
        clientId: "$clientId",
        geo: "$geo",
        country: "$country_name",
        site: "$site_name",
        studentInd: "$studen_Ind",
        teacherInd: "$teacher_Ind",
        userid: "$user_id",
        attendanceDate: "$attendance_date",
      },attendancecount: {$sum: 1,}}},
  {$group: {
      _id: {
        client: "$_id.client",
        geo: "$_id.geo",
        country: "$",
        site: "$",
        attendanceDate: "$_id.attendanceDate",
      userAttendanceCount: {$sum: "$attendancecount",},
      studAttendanceCount: {$sum: {$cond: [{$eq: ["_id.studentInd", "Y"],},1,0,]}},
      teacherAttendanceCount: {$sum: {$cond: [{$eq: ["_id.teacherInd", "Y"],},1,0,]}}
  {$lookup: {
      from: "accessSettings",
      let: {site: "$",client: "$_id.client"},
      pipeline: [
			{$match: {$expr: {$and:[{$eq: ["$site_name", "$$site"]},
									{$eq: ["$client_name","$$client"]},
									{$eq: ["$status", "A"]}]
      as: "result",}},
    $lookup: {
      from: "studentTeacher",
      let: {site: "$",client: "$_id.client"},
      pipeline: [{$match: {$expr: {$and: [{$eq: ["$site_name", "$$site"]},
                {$eq: ["$clientName","$$client"]}]}}}],
      as: "res"}},
{$unwind: {path: "$res",preserveNullAndEmptyArrays: true}},
{$group: {_id:
            {client: "$_id.client",geo: "$_id.geo",country: "$",state: "$_id.state",site: 
          "$",attendanceDate: "$_id.attendanceDate"},
      userAttendanceCount: {$max: "$userAttendanceCount"},
      studAttendanceCount: {$max: "$studAttendanceCount"},
      teacherAttendanceCount: {$max: "$teacherAttendanceCount"},
      ActiveuserCount: {$sum: 1,},
      ActiveAgentCount: {$sum: {$cond: [{$eq: ["$res.studen_Ind", "Y"]},1,0]}},
      ActiveSupervisorCount: {$sum: {$cond: [{$eq: ["$res.teacher_IND", "Y"]},1,0]}}}

maximum time getting the “Operation exceeded the time limit.” error, please verify the query and suggest a way to improve the performance. Please help me with this

You can increase maxTimeMS for aggregations:

You can also look at potentially creating indexes to make the lookup faster.

1 Like

@Massimiliano_Marcon : Thank you for your response,

In our application, we are integrating Metabase reporting tools, Metabase toll returning errors due to the database server timed-out error. Please check the query, and correct it in case of any query issues, or suggest a better way to write a lookup and group aggregation query.

Increasing MaxTimeMs may be a solution for Compass level, in 5 seconds we need the db response, for that, we need your help, please help on this.

Have you tried breaking down the aggregation into parts to check which is taking the time, the groupings or the lookups?
If you can run an explain() on the query to highlight any collection scans which will indicate that no index is being used.

I could not see in your post about what indexes are being used, but if you add an index covering the grouped fields and have a sort before the group is run it should massively reduce the time to run assuming this is the bottleneck.

On the lookups also ensure you have indexes in place to support them.

@John_Sewell : Thank you for your support.

i have created the required Indexes in the above three collections.

  • created indexes for all the fields which we using inside the $match
  • created indexes for all the fields which we used for Sgroup

Group - aggregation taking 500+ MS time
Lookup - aggregation taking a minimum of 2 Seconds & maximum of 4.5 Seconds

My worry is, getting time-out errors in the dev region where we have less number of records, I can’t imagine how MongoDB will perform in the prod region, there we can expect a minimum 10 times more records in the attendance collection.

In the Explain plan, I can see that, only two indexes were used throughout the task.

In the query above I cannot see a $match only the $group, is there part of it missing? Do you have a $sort before the group? I think recent optimisations have reduced the need for this but I normally explicitly do it.
How much data do you have, does 4laks mean 4000 records? Mow many records are there running through the $lookup stage?
It does seem the $lookup is the lion share of the time. I’ll try and setup a simulated collection with appropriate number of records and have a play.

Taking a further look this morning…

  • The “result” lookup does not seem to be used
  • Combine the top two groups into one group or all the groups into one and then do the lookup

You have an un-masked email in the sample data…

How often does this need to run? There is no filter on the aggregation, if you were just to process a day at a time would that make things easier, or monthly and save the results to another collection which stores pre-calculated attendance records?

AccessSettings Collection - minimum of 55,000 and maximum of 1,00,000 documents
studenteachers collection - as of now we have 4,00,000 documents [ it may increase to 10000 by month]
Attendance Collection - per day 2 entries for each of the students & teachers, so, as of now per day we can expect minimum 60,00,000 entries for one month.

i have tried with $match also, there is no difference.
as per me major time consumed by Lookup .

First $Group - basically in the Attendance collection we will have more than one record for a user to a particular date, Since we are extracting the headcount, the query should remove the duplicate logged-in user record based on the date (date-wise distinct user).
— Due to this group, we will get less number of documents

Second $Group - getting logged-in head count based on Login Date, Geo, country, site, state, and client wise from Attendance collection.
— due to this group, we will get only the summary data means less number of documents

Before the $group if we do the lookup, it will take more time I have tested the same, [looku up taking less time with less number of documents].

Can you list all indexes you have to check they are able to support the lookup criteria?

To be honest this is quite a chunky query and may just take time to run with large datasets, you may need to pre-calculate parts of it if you need to run over ALL data every time. I assume this is not something that’s being called a lot, if so you may want to change the schema to reflect most common usage as opposed to having everything in different collections.

“clientID” is being grouped upon in the first group and not used in second, so that’s work that’s thrown away, it may be an idea to make a clean start writing it from scratch again knowing what you want it to achieve.

Can you attach an explain of it running?