Where does my change stream watcher need to run?

I’m looking to capture live changes from my MongoDB Atlas cluster and send them to BigQuery. Much in line with this article which uses Node.js to listen for change stream events and then writes them to Pub/Sub. It recommends using a change stream event listener for the purpose.

The part which the article doesn’t cover is where to run your code. Can this be a serverless function or does it always need to be “on” (i.e virtual machine)?

The other slightly confusing area is Atlas Database Triggers. These seem to allow for similar functionality in that they automatically listen for change events in the database and subsequently can be set to run a function (I think only on Atlas/Realm). Am I right in thinking this is an alternative method of listening to the change stream where I wouldn’t need to actually run the watcher to listen for events as that part has already been handled. I just need to run the desired action (post a message to Pub/Sub in my case).

In summary, please can someone confirm

  1. If the watcher code can run as a serverless function (on Realm or elsewhere)
  2. If Atlas Database Triggers are an alternative to running your own watcher code which achieve exactly the same thing. Are there any downsides?

Thanks.

Hi Ian!

1. If the watcher code can run as a serverless function (on Realm or elsewhere)
In general, serverless functions are short lived, meaning that they run until one of these things happen:

  • a result is returned;
  • an error/exception is thrown;
  • some timeout is exceeded (up to 15 mins for AWS Lambdas, up to 9 mins for Google Cloud Functions).

On the other hand, if we subscribe to a MongoDB event stream, for example with collection.watch(), we’ll need to keep the function running, so we can receive any new events (changes in the collection). So, using a serverless function for that scenario wouldn’t work.

You’ll need an instance that’s running continuously such as EC2 or Google App Engine.

2. If Atlas Database Triggers are an alternative to running your own watcher code which achieve exactly the same thing. Are there any downsides?

I’d say Atlas Triggers are a great alternative to what’s described in the blog post. If you go with them, you’d need to define an Atlas Function (a JavaScript function executed on Atlas) that publishes messages to Cloud Pub/Sub. I played around with this scenario and I can confirm it’s possible. There are a couple of caveats though:

  1. Atlas Functions don’t support the Node.js library for Cloud Pub/Sub @google-cloud/pubsub. However, you can use the Pub/Sub REST API instead — Cloud Pub/Sub API  |  Cloud Pub/Sub Documentation  |  Google Cloud.

  2. Authenticating to Google Cloud through Atlas Functions can be a bit tricky. I recommend using the google-auth-library and JWT authentication.

3. [Bonus] Alternative integration between Atlas and BigQuery

Another way to set up a stream between Atlas and BigQuery, is by using Dataflow — https://cloud.google.com/blog/products/data-analytics/mongodb-atlas-and-bigquery-dataflow-templates.

2 Likes

Hi @Stanimira_Vlaeva. Thanks for taking the time to such a helpful answer.

  1. What I expected. Ideally I’m looking to stay serverless on this one.

  2. That sounds like a semi-viable option. Why don’t Atlas Functions support the Node.js library for Pub/Sub? Surely it would be quite necessary in this type of situation? Thanks also for the tip about authenticating.

  3. Using Dataflow is a good option but don’t you still have this same dilemma where you first need to listen and write messages to Pub/Sub to be consumed by Dataflow? That loops back to needing one of the first two approaches. In my particular case I don’t really need any transformations on the data, just having the changes sync’d up to BigQuery. That’s why I was looking at using the BigQuery streaming ingest from Pub/Sub as per the blog post, or via Atlas Triggers.

Thanks again.

Hi Ian, if you’re still stuck on this you could try out Streamkap. We have a number of companies using our service to stream data from MongoDB to BigQuery.

hello @Ian, I am facing a similar challenge. Need to stream data from Mongdodb to Bigquery. How did you proceed ?