Atlas Mapped: Analytics Nodes To Power Your BI Now Available

Leo Zheng and Dj Walker-Morgan

#Releases#Atlas

In this edition of Atlas Mapped, your regular update on MongoDB's cloud managed service, we're introducing you to a feature on MongoDB Atlas, Analytics Nodes. We mentioned them in the last Atlas Mapped and wanted to explain how they work. Analytics nodes allow you to have one or more nodes in your MongoDB cluster that are dedicated to handling your analytics workloads.

What is an Analytics node?

First, a quick refresher on MongoDB replica sets which are at the core of MongoDB's high availability story. By grouping three or more database servers together, you can create a replica set with a primary server and secondary replicas duplicating the primary's data. This architecture is primarily designed with high availability in mind and can automatically handle failover if one of the servers goes down - and recover automatically when it comes back online. We call all these nodes electable because an election is held between them to work out which one is primary.

But there's so much more we can do with this because MongoDB's architecture is flexible. First of all, there's read scaling. You can add nodes which can handle queries which only read from the database. The node will be another replica, but its job will be to answer those queries and if there's a failure, it won't take part in the automatic recovery or election. You can have these read_only secondary nodes in any region where you want to get faster access to data. Historically though, you haven't been able to direct all your query traffic at read_only nodes - you could only select the more generic group of "secondary nodes" as a target for your queries. That meant that long-running queries could still have an impact on your operational workload.

And this is where Analytics nodes come in.

Analytics nodes are like read-only nodes but you can exclusively target your queries at by setting your read preference for analytics type nodes. That means the workload for analytics work is isolated to these nodes so your operational performance isn't affected. This makes Analytic nodes ideal to query from the BI Connector as, no matter how complex the analysis, it won't slow down the operational work of the cluster.

Analytics nodes in practice

Let's look at an example. Assume you have long-running analytical queries that you want to run against your cluster and you don't want them competing with your regular operational workload for resources. The simplest way to address this would be to add an analytics node to your cluster and then target it in your connection string using an Atlas replica set tag. Analytics nodes can be added in Atlas by editing your cluster configuration, turning on support for "Multi-Region, Workload Isolation, and Replication Options" and then adding a node in the "Analytics nodes for workload isolation".

Atlas Configuration

Note that Analytics nodes are a feature reserved for M10 and larger Atlas clusters.

Then it's a matter of adding the appropriate targeting to your connection string like so:

mongodb+srv://<USERNAME>:<PASSWORD>@foo-q8x1v.mycluster.com/test?readPreference=secondary&readPreferenceTags=nodeType:ANALYTICS

Note that the readPreference is still for secondary nodes, but that the readPreferenceTags set the required nodeType to ANALYTICS. That's pretty much all you need to set Analytics up going and isolating your analytical workloads.

Analytics nodes and the BI Connector

One useful thing to know is that, by default, the BI Connector for MongoDB Atlas clusters will always try and target analytics nodes with its queries. This means that just adding the BI Connector and an Analytics node is all you need to do to configure yourself an isolated analytics system. Of course, the BI Connector isn't the only thing that can make use of an isolated workload node, as we show above.

A little more on ReadPreferenceTags

If you haven't come across ReadPreferenceTags, they are a generic way of passing parameters to your cluster connections. They aren't just for analytics nodes. For example, you can set your read preference tags to push traffic to a specific region where you may have configured a read_only secondary.

Let's say that your primary cloud region is AWS US East 1 but you also have application clients in Germany where your audience is growing. By creating a local read-only node in Germany and using Atlas replica tags, you can avoid making a trip across the ocean to fetch data.

mongodb+srv://<USERNAME>:<PASSWORD>@foo-q8x1v.mycluster.com/test?readPreference=nearest&readPreferenceTags=provider:AWS,region:EU_CENTRAL_1

And it's not just connection strings that can activate this preferential behavior. If your MongoDB driver supports setting read preferences and tags at query time - which it should - then you can change your preferences per query without having to reconnect.

Atlas replica set tags are available for Atlas clusters M10 and higher. To learn more about Atlas replica set tags and how you can use them, check out the tags documentation.