Precomputing Chart data while supporting filters

Dylan_Pierce · May 14, 2024, 6:56pm

Hi there,

I’m using embedded Charts to display charts to my users. It works great, I set up a filter as well to filter on a timestamp (created_at) for example.

The only issue my customers face is the charts speed is very slow, I have a million documents in this collection, and I’ve even used a Data Lake to reduce the amount of data per document in the collection.

Is there a strategy that I can follow to pre-compute the $groupBy 's that Charts perform, but still support filtering by date?

As soon as you pre-compute the $groupBy, you’re stuck with `yyyy-mm`` values which are problematic because:

yyyy-mm isn’t filterable with date queries
yyyy-mm isn’t as granular as specific days, which customers expect to be able to filter by

Any help with performance tips or strategies is much appreciated.

tomhollander · May 14, 2024, 10:55pm

Hi @Dylan_Pierce -

You could definitely precompute this to improve performance - the normal strategy is to create a trigger (either set to run on a schedule or when the data changes, depending on your requirement), run an aggregation query and use $out to write the results to a different collection.

Your first problem is easily solved by using a date field to identify each precomputed group, e.g. the first date of the month. You could still format it as yyyy-mm on the chart but use the date filter.

The second problem is unfortunately a casualty of the performance boost. You could consider a compromise solution where you pre-group the data by day instead of by month. Assuming you do have a number of raw documents per day you would still get a performance boost by precalculating each day, while retaining the ability to do granular filtering. However the performance increase would be less then if you precalculated each month.

HTH
Tom

Dylan_Pierce · May 15, 2024, 2:10pm

Thanks for the tips @tomhollander , that’s very useful.

I’ve set up a chart with this proof of concept of a cached daily grouping, and the performance is definitely better. Really appreciate that.

It is really tedious to go through this workflow however:

Design the query to create the initial cached collection ($out)
Design the chart and make sure all filterable variables are included
Design the Trigger with a slightly modified version ($merge, and filter on recent entries)
Whoops, don’t forget to add a unique index on the cached collection for merging to work properly
Rinse and repeat for every single chart that needs to group by a different category
Rinse and repeat again for single chart that needs to show a different binning (weekly instead of monthly)

Embedding directly is really great, I just hope in the future there are shortcuts to support binning, or an easier way to set up caches for performance.

Thanks again!

tomhollander · May 16, 2024, 3:25am

Thanks Dylan, glad the approach worked, even if it wasn’t as easy to get running as you’d hoped. We do have some plans to try to automate this flow at some point in the future.

system · May 21, 2024, 3:26am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.