Massage your Charts data with Data Source Pipelines
MongoDB Charts now lets you attach aggregation pipelines to data sources, giving you a new option to pre-process data used to build visualizations.
MongoDB Charts is the best way to build visualizations from your MongoDB data, because it natively understands the MongoDB document model, effortlessly handling complex structure like nested documents and arrays. However, sometimes people find that the data in their collections is not optimized for data visualization. Perhaps it includes sensitive data that you don’t want available for charting; maybe it uses incorrect types like having dates stored as strings; or it may split data across multiple collections, making it hard to include it all on a single chart.
In the past there have been two possible solutions to this problem:
- You could create a View using the mongo shell. This approach is powerful but requires using a different tool and may require different permissions.
- You could use the Charts Query Bar to add an aggregation pipeline to a chart. This avoids the need to use other tools, but the pipelines can’t easily be shared across charts, and this approach doesn’t support joining collections using $lookup.
With this month’s update to Charts, we’re pleased to provide a third option, giving you the power of Views without needing to leave the Charts interface. In Charts, a data source is a reference to a specific collection that you want to use for data visualization. Charts now allows data source owners to attach an aggregation pipeline directly to a data source, guaranteeing that any charts using it receive the same, “massaged” view of the data. You can edit the aggregation pipelines by hand, or use the Aggregation Pipeline Builder from MongoDB Compass or MongoDB Atlas to help you create the perfect data processing logic.
Let’s look at a few examples of how you could use this feature. First, to redact data, you can use a $match stage to filter out whole documents, or $project to remove fields from the documents that remain. Here’s a simple pipeline that shows both of these together.
Next, let’s look at data type conversions. For a variety of reasons, some people create documents which use suboptimal types, such as using strings or integers to represent dates. Charts expects dates to use the Date BSON type in order to perform operations such as binning, so you’ll want to make sure all of your types are correct. Fortunately, this is now easy to correct for all charts using a data source with a pipeline like this:
Finally, we come to one of the most requested Charts features, the ability to join data from multiple collections. While the MongoDB document model means that you can often avoid joins by nesting data into structured documents, there are also times when modelling your data across collections makes sense. By creating a pipeline using $lookup, you can join data from multiple collections into a single document, ready for charting. Here’s an example:
Data sources with aggregation pipelines look just like any other data source, but the saved pipeline will always execute when the chart data is retrieved. Let’s look at the chart builder for the data source shown above. Note how the documents from the movies collection now contain a new nested array called comments containing the corresponding documents from the comments collection, joined using the specified ID fields. You’re now free to use all of the array reduction capabilities in Charts to visualize the data from the joined collections.
These are just a few examples of how you can use aggregation pipelines on Charts data sources. Of course, you’re also still able to use Views and chart-level pipelines too, so you can pick the approach that works best for your scenario.
If you want to try this yourself, you can activate Charts on your Atlas project today. Alternatively you can install Charts locally in your own environment - make sure you’re using the latest 19.09 release to get access to this new feature.