Is there a way to force leverage the indexes defined on source collections, for the fields used as dashboard filters.

Prakash_Dewan · July 4, 2024, 12:32pm

How can we leverage the indexes defined on source collections, for the fields used as dashboard filters.

We have indexes defined on some fields in the source collections, on which standard db view is created, which is what charts are created on. Even though the dashboard/chart filters are using the same fields having indexes defined on source collection, the indexes are not being hit/used.

tomhollander · July 4, 2024, 9:22pm

A chart query runs in Atlas and uses indexes under the same conditions as any other query. Dashboard filters are the first stage (after views) so normally indexes will be used.

You mentioned you are using a DB view; this is likely the issue in your case. A view is just additional aggregation stages that run before your own stages. This will push the dashboard filters later in the pipeline which means it may not be able to use indexes.

Prakash_Dewan · July 5, 2024, 4:36am

thanks @tomhollander for your response. Based on your inputs, I have also tried to create a chart (under a dashboard) directly on a collection (having the indexes defined). I added a filter to the dashboard on the same field as the one being indexed in collection.

I was hoping the index to come in play for this setup atleast, when loading the dashboard and exercising difference selections for the filter - but even that’s not hitting the index.

tomhollander · July 5, 2024, 5:22am

That is surprising; there must be something more going on. As per the Backing Aggregation Pipeline docs the dashboard filters are run very early in the pipeline. Are you able to turn the Atlas profiler on to capture the full query being executed?

Prakash_Dewan · July 5, 2024, 5:29am

Well. I am on M2 cluster, so don’t have profiler or any other monitoring/logging available to me at the moment.

Prakash_Dewan · July 8, 2024, 5:00am

@tomhollander - given the fact that, i can’t leverage the profiler - are there any more pointers you may want me to check?
Also, if you can direct me to an example of a sample dashboard from the sample data load we have on Atlas, which behaves the way you/document outline - it will be great.

thanks!

Prakash_Dewan · July 8, 2024, 4:29pm

@tomhollander - I could explore Profiler in our prod cluster (M10). Also, missed to mention earlier that we are using mongo db charts sdk (latest version) to embed charts/dashboard into our application. The dashboard filters we are passing through chart’s sdk are being set in the eventual query (as captured from profiler/query insights) between the view & chart aggregate pipeline, however it’s not hitting the index (though defined).

While with dashboard loaded within Atlas, the index does come in play, though interestingly I can’t see the filter’s $match stage between the view & chart pipeline.

tomhollander · July 9, 2024, 12:27am

OK. So when there is a view (whether a Charts view or a database view) in play, those stages will come before the filter and will almost certainly prevent an index from being used (although it may depend on what exactly the view does).

If you are injecting filters using the SDK, these come is stage 7 (see docs) so this will be too late for filters to apply. However we have introduced a new preFilter method which allows you to push the filter up to stage 2 (same place as the dashboard filters) which will allow the index to be used provided you aren’t using views.

Unfortunately the preFilter method is in SDK 3.3 which we needed to roll back due to a last minute bug, but this will be available again very soon.

Prakash_Dewan · July 9, 2024, 5:11am

thanks @tomhollander . Yeah, we did read about the preFilter when you shared earlier. And we got to know that though it is documented, it will be available with sdk v3.3. Do we have any ETA for the same?

Prakash_Dewan · July 9, 2024, 6:04am

Also, what I have noticed is…

I can confirm that, when loading the dashboard within Atlas (one thats built on db view) - indexes are being hit for dashboard filters (stage 2 in the list), chart filters (stage 8 in the list, which is applied even after embedded filters, which are stage 7).
When an index is hit for #1 cases, i.e. “planSummary”: “IXSCAN { xxxxxxx : 1 }”
is captured in the effective query - only the view aggregate pipeline is captured in the query, not the chart pipeline.
When we are loading embedded charts/dashboard within our app, the dashboard filters are added between the view pipeline & chart pipeline (as captured in the effective command query), however no index coverage for the query.

So, I am wondering - when a stage 8 chart filter can hit an index, why not same fields from embedded filter (which is supposed to apply on stage 7) ?

tomhollander · July 9, 2024, 6:08am

Re the SDK release - the team is working on this now and will release a new SDK as soon as it’s ready.

Re the index usage - this is all handled by MongoDB’s query optimiser, so the behaviour isn’t specific to Charts. In general the query engine will try to use as many indexes as it can, but pipeline stages that change the shape of the data will often make it impossible to use indexes for later stages. But there’s no simple answer to when indexes will be used, it will depend on the exact shape of the query and behaviour of the query optimiser.

Prakash_Dewan · July 10, 2024, 6:09am

We have multiple dashboards/charts embedded in our application, using charts SDK v3.2.1. We often see them load pretty slow, so we started looking into what could be the cause. We have evaluated the DB query side and that seems to be fairly quick, so we need to know what all contributes to the entire process of serving the embedded charts to the end user. For e.g. we aren’t sure of many recurring calls to

/endpoint/embedding_adf (in case of individual chart)
/endpoint/dashboard_shared_adf (in case of whole dashboard)

with the eventual last instance of the call (taking reasonable time waiting for the server) returning with the data/documents for the charts. We assume this is the call, which eventually waited for the query to get executed on the db server. If thats the case, what are earlier calls about and can they be avoided?

Kristina_Stefanova · July 12, 2024, 9:27am

Hi @Prakash_Dewan

We are using an asynchronous approach to fetch data for these endpoints which allows us to get data for long running queries. This works by sending continuous polling requests to determine if the aggregation pipeline has finished executing, and if so, to display the data. This is why for requests fetching fresh data, you will see multiple requests with a pending status, and one final one with a complete status and the data. The pending requests can’t be avoided.

Depending on the dashboard refresh setting you have set for your embedded dashboard, the result from the query will be cached, and then directly served (without the pending requests) for the same query if it falls into the refresh setting.

Prakash_Dewan · July 12, 2024, 10:35am

thanks @Kristina_Stefanova. Appreciate the response.

So, the first instance of the asynchronous call itself submits the aggregation pipeline for execution on the db server? And then subsequent calls poll every 1+ seconds. The eventual recurring call, which comes back with data in response takes decently higher time - i.e. 4-8 seconds. Do we know, what all goes in for that instance?

It would be great to know, if the view pipeline (if applicable) & chart pipeline are executed on DB server together as one? i.e. the first async call submits the effective aggregate pipeline (view + filters + chart) for execution to the DB server, and then keep looking for the results?

In case of a dashboard with multiple charts, are the aggregation pipelines for all of them submitted in first instance? And are they run in parallel on the DB server?

I am putting these questions, since we have observed that, on mongoDB samples, i.e. Movies dashboard & the embedded chart examples hosted on https://codesandbox.io/ - when a dashboard is loaded, separate api calls are being made for each chart in the dashboard, all at same time, and in parallel.

While on our cluster, one /dashboard_shared_adf call is made for a batch of charts together. e.g. For a dashboard with 10 charts, 3 calls are being made with 4, 4, 2 chart details in the request payload. What could be causing this?

Thanks in advance!
Prakash

Prakash_Dewan · July 15, 2024, 6:08am

@Kristina_Stefanova , @tomhollander - looking forward to having answers to my questions in last comment.

Kristina_Stefanova · July 15, 2024, 7:40am

Hi @Prakash_Dewan

Yes, precisely, the first request submits the whole aggregation pipeline for each chart, and then we poll via other requests to see if the data fetching has finished, and pull that data. This new process allows us to run aggregations that can take up to 10 minutes while before the timeout was 2 mins. We combine a few chartIds into a single poll request but the more charts you have on your dashboard, the more requests there will be. If you have more charts, they will be sent in subsequent requests.
The data fetching requests do run in parallel, and we load the charts individually as soon as their data comes back. The time when the requests are sent or in pending state can vary because of multiple factors - browser request limitations, cluster load, polling time, optimization of the query for caching purposes.

For every chart we execute one aggregation pipeline that is constructed in the order described here (the same link that Tom shared earlier).

Good question about the SDK examples. This asynchronous polling feature is being rolled out in stages, so the tenant that has the embedding charts from the examples is still not upgraded (and is working in the old way) - which does not allow long running queries, and is time limit restricted.

We released the preliminary filter in the SDK last week, this allows an embedding dashboard filter to be executed right after the Charts view stage. That might use the index you mentioned before. Please check the Embedding SDK version 3.3.1

I hope that helps. Please let us know if you have any more questions.

Prakash_Dewan · July 15, 2024, 8:22am

thanks @Kristina_Stefanova. Appreciate your detailed response.

Not sure which code contributes to the support of long running queries, however the SDK examples on https://codesandbox.io/ also used v3.2.1 of charts SDK, similar to what we use. What do you mean by “optimization of the query for caching purposes” - when you cite it as one of the factor for data requests.

Good to know about availability of ‘preFilter’. Will be exploring it soon.

Thanks!

Prakash_Dewan · July 15, 2024, 12:52pm

Also, the difference about “individual calls for each chart” Vs “clubbed call for multiple charts”. Is that also something, which is because of a different code on example/sample hosting tenant & our cluster?

Thanks!

Kristina_Stefanova · July 16, 2024, 5:43am

Hi @Prakash_Dewan

The Atlas Charts app itself contains the support for long running queries. The Embedding SDK connects to the Charts project to get data, so it doesn’t matter which version of the package is used if the tenant/project/app is using the long running queries.
The difference of the way the requests are made comes entirely from the Charts app. The long running queries feature have been rolled out to the majority of Chart apps now, and our plan is to get it to 100% very soon.

This is something that gets done in a lower level than Charts. It’s part of a caching mechanism, so that this query gets results faster when requested a second time.

Prakash_Dewan · July 16, 2024, 7:51am

Thanks @Kristina_Stefanova. Can you answer the other question I posted?