Indexing during aggregation

Cornelius_Blank · October 13, 2023, 11:21am

Hi there,

I’ve been working a lot with mongodb, but there is one question I’ve never really got an answer for. I have a lot of data in one collection (+10 million docs), where each day thousands of docs are added. In my aggregation I want to filter all documents in a specific timeframe, mostly per month. My first stage in the pipeline is a match query where I reduce to amount of documents to around 50-100 thousand. With an index this is extremly fast, however the next two stages are always eather unwind or groupby stages with multiple attributes. This process then takes at least 3-10 seconds to return the desired result.

My question now is, is there any way to speed up the group_by or unwind stage, when considering lots of documents. Or is there any technique to faster summarize nummerical values. Besides, we are currently using an M30 cluster. Does this also have a major impact on the pipeline duration? Any advice would help me enormously.

Jennysson_Junior · October 13, 2023, 6:18pm

Hi @Cornelius_Blank

Unwind and group stages with a thousand documents probably will be very hard to execute.

On the group stage, build some index that covers this stage can help a lot to execute it but will not help on the unwind stages.

Have you tried to create some documents with the Computed Pattern ?

There is a several patterns to build your documents that helps a lot with performance issues. You can take a look o some patterns in this article blog.

Hope this helps.

steevej · October 14, 2023, 1:21pm

It would be easier to give meaningful recommendations if you could share sample documents from the collections, expecting results and the aggregation you have.