Use MongoDB for reporting

Marbin_Diaz-Diaz · July 3, 2023, 6:43pm

I’m new to Mongo but have many years of experience in the business intelligence world.
Our team was acquired by another company. This company uses MongoDB as their primary database.
Data events they are usually handled by a message queue in this case Google PubSub, from there they insert events into Mongo.
Any new application that would like to consume those events will use the Topic (pubsub) instead of retrieving the data from Mongo.
Management made it clear that they don’t want us to do any interaction directly with Mongo. They are worried about performance being affected in the servers, slowing down the response of our primary application that end users depend.

I don’t know if Mongo offers the option of using a secondary replica that we can use for reporting purposes.
That we can hit it as much as we want (batch loads, changestreams,etc) without having any effects to the primary mongo server.

What are our options?

John_Sewell · July 3, 2023, 7:25pm

Search for reporting in this link:

You can setup a hidden replicaset member for reporting so you do not hit the primary, you could also set secondary read preference to not hit the primaries for a reporting engine.

I’ve not needed to use this as our system is low enough volume that reporting from the primaries does not cause an issue but on several of the mongo conferences I’ve been on recently this has been mentioned as a use case for reporting.

Marbin_Diaz-Diaz · July 5, 2023, 3:37pm

Thanks.
I will look into it.
Our prod env is high volume, saturated with transactions. That explains why management is so jealous about it.

Kobe_W · July 6, 2023, 4:23am

A hidden node can not be reached by read queries from clients. To server those “specific traffic” from a “specific node”, you can try using tag sets

steevej · July 6, 2023, 11:05am

one thing you have to be aware is that a hidden node, being member of the replica set, receive the same write load as the other nodes. with the extra index and reporting work load it needs to be sized correctly.

steevej · July 6, 2023, 11:08am

a little correction is in order, the hidden nodes cannot be reached via a replica set connection string but it is reachable with a direct non-replica set connection string. that is how you are able to created reporting specific indexes that are not replicated to the other members of the replica set.

John_Sewell · July 6, 2023, 12:15pm

I realised my original link was to archived documentation, this seems to be the more recent version:

Marbin_Diaz-Diaz · July 6, 2023, 3:30pm

Doesn’t this fit better in our case?

How about heavy workloads on an analytic node?
Will they affect the whole cluster?

steevej · July 11, 2023, 3:31am

It does if you are on Atlas. But I suspect they are just the same as hidden replica set nodes as mentioned before but manageable via the Atlas GUI and CLI.

An analytic node handles the same write load as the other nodes of the cluster. If your usual traffic is mostly writes, the analytic node needs to be bigger as to handle the extra analytic reads. If your usual traffic is mostly reads, the analytic node might be smaller unless the analytic reads are high.

Since your

it sounds like your traffic is mostly writes, so you might need a bigger node.