Ideas to fit complex aggregation in acceptable runtime

Hi, I’m trying to understand why my aggregations’ execution time increased from 1s to more than 30s after increasing the number of documents.
Background… I have only one collection in my db, which aggregates daily metrics.

Schema:

{
    "_id": {
        "site": "111",
        "path": "index.html",
        "macro": "macro1=value1&macro2=value2",
        "granularity": "DAILY",
        "period": {
            "$date": "2022-09-01T00:00:00.000Z"
        }
    },
    "createdAt": {
        "$date": "2022-09-22T12:52:38.307Z"
    },
    "metrics": {
        "metric1": {
            "value": {
                "$numberLong": "1"
            }
        },
        "metric2": {
            "value": {
                "$numberLong": "1"
            }
        },
		...
		"metricN": {
            "value": {
                "$numberLong": "1"
            }
        }
	}
}

My aggregation:

[{
        "$match": {
            "_id.site": "111",
            "_id.granularity": "DAILY"
        }
    }, {
        "$group": {
            "_id": {
                "path": "$_id.path",
                "period": "$_id.period"
            },
            "metric1": {
                "$sum": "$metrics.metric1.value"
            },
            "metric2": {
                "$sum": "$metrics.metric2.value"
            },
			.
			.
			.
            "metricN": {
                "$sum": "$metrics.metricN.value"
            }
        }
    }, {
        "$project": {
            "fields": "$_id",
            "metrics": {
                "metric1.value": "$metric1",
                "metric2.value": "$metric2",
                .
				.
				.
                "metricN.value": "$metricN"
            },
            "metrics.calculatedField1.value": {
				...
            },
            "metrics.calculatedField2.value": {
				...
            },
            "metrics.calculatedFieldN.value": {
				...
            }
        }
    }, {
        "$sort": {
            "fields.period": -1
        }
    }, {
        "$facet": {
            "grandTotal": [{
                    "$group": {
                        "_id": null,
                        "total": {
                            "$sum": 1
                        },
                        "metric1": {
                            "$sum": "$metrics.metric1.value"
                        },
                        "metric2": {
                            "$sum": "$metrics.metric2.value"
                        },
						.
						.
						.
                        "metricN": {
                            "$sum": "$metrics.metricN.value"
                        }
                    }
                }, {
                    "$project": {
                        "total": 1,
                        "totalMetrics": {
                            "metric1.value": "$metric1",
                            "metric2.value": "$metric2",
							.
							.
							.
                            "metricN.value": "$metricN"
                        },
                        "totalMetrics.calculatedField1.value": {
							...
                        },
                        "totalMetrics.calculatedField2.value": {
							...
                        },
                        "totalMetrics.calculatedFieldN.value": {
							...
                        }
                    }
                }
            ],
            "paginatedData": [{
                    "$skip": 0
                }, {
                    "$limit": 40
                }
            ]
        }
    }
]

I’m running this aggregation in a M50 cluster tier and I have 30 milions documents in my collection.
Also, I have a compound index on all key attributes and avg document size is 1kb.
My aggregation should be very flexible. It should be possible to group by any key attribute and filter by any field.
When I goup by PERIOD and PATH for a specific site, for example, according to explain, FETCH stage uses IXSCAN and returns 416151 documents, meaning it uses 416MB for this aggregation pipeline stage, which exceeds 100MB and uses disk (allowDiskUse is true). This aggregation takes 51s to execute, even paginated.

Question: How to improve this aggregation performance to reach max 5 secs execution time? Should I increase aggregation pipeline stage memory? How? Is this schema good enough for this query?