Five Minute MongoDB - Change Streams and MongoDB 4.x

Knowing what changes are happening in your database deployments can be the key to synchronizing different data services around MongoDB. Rather than regularly query a collection for changed documents, it can be easier to filter a stream of changes and take action immediately. This is a style of reactive programming and can be very powerful. These days, it's remarkably simple to get those streams of change information.

A bit of history first. Before MongoDB 3.6, if you wanted to listen to what was changing in your MongoDB deployment, you had to "tail the oplog", a collection used in replication which logged changes. The process for "tailing the oplog" could often end up with complex, unsupported, and fragile code which is not something you want in production. That meant people would avoid using the reactive programming style.

Change Streams and Collections

That situation began to change with MongoDB 3.6 when Change Streams arrived. Change streams make it simple and supported to listen to changes in a collection. How simple? Let's watch for some changes happening in a movieDetails collection in this Node.js example.

const MongoClient = require("mongodb").MongoClient;

const uri = "MONGODBURL";

const client = new MongoClient(uri, { useNewUrlParser: true });
client.connect().then(db => {
 const changeStream = client.db("video").collection("movieDetails").watch();
 changeStream.on("change", next => {
   console.log(next);
 });
});

This code connects to the database. It then selects a database and collection and uses the watch() function to create a change stream. We add an event trigger with .on("change", … which will then pick up change events in the change stream and it'll call a function. In this case, it just prints out the change stream event when a document is changed. If I run this code and then I edit the details of a movie using MongoDB Compass I get…

{ _id:
   { _data:
      '825C51D03F0000000129295A1004E515B4338C574BA2B9603CB1C7FB3B0446645F696400645C0EC4B74B052F9E2EF0C3810004' },
  operationType: 'replace',
  clusterTime:
   Timestamp { _bsontype: 'Timestamp', low_: 1, high_: 1548865599 },
  fullDocument:
   { _id: 5c0ec4b74b052f9e2ef0c381,
     title: 'PS I Love You',
     year: 2007,
     ...
     awards: { wins: 2, nominations: 4, text: '2 wins & 4 nominations.' },
     type: 'movie' },
  ns: { db: 'video', coll: 'movieDetails' },
  documentKey: { _id: 5c0ec4b74b052f9e2ef0c381 } 
}

You can read more about what all that means in the Change Events documentation but the quick version is that the important information, the kind of change, can be found in the operationType field. It can have a value of insert, update, replace, delete or invalidate when we watch a collection. The first four of these types represent what their names say. The replace we see in the document above is a result of Compass doing edits by replacing the document in the collection.

The invalidate operationType turns up in a change stream where the collection you are watching is dropped or renamed, or if the database the collection is in is dropped. It's a signal to close the change stream. The rest of the document is information about what the change is; which namespace, what the document looks like now and when the change happened.

By the way, the example change document above was generated on MongoDB 4.x, which adds a field over previous versions, _data. This is a resume token which lets applications that make a record of them use them to restart at that point in the stream.

Beyond Collections

If you wanted to track changes in a collection, MongoDB 3.6 was great, but the people who'd used the oplog in the past to detect changes were often people who wanted to take actions on all the changes in the database to duplicate them in another system. That's where MongoDB 4.0 comes in. It added the ability to create change streams on databases or entire deployments - replica sets or sharded clusters. Instead of doing a watch() on a collection, 4.0 allowed you to do a watch() on the database or an entire deployment.

const MongoClient = require("mongodb").MongoClient;

const uri ="MONGODBURL";

const client = new MongoClient(uri, { useNewUrlParser: true });
client.connect().then(db => {
 const changeStream = client.watch();
 changeStream.on("change", next => {
   console.log(next);
});

Now, whenever there are any updates on any collection in any database, they are printed to the console. These aren't the only change events we are going to get either. As we've zoomed all the way out to the widest scope of changes, we will now see drop events when a collection is dropped, dropDatabase events when a database is dropped and rename events when a collection is renamed.

If you are just interested in what happens within a particular database, you can open the database and do a watch() on that. You'll get all the updates for collections in that database, along with the drop and rename events. You won't get the dropDatabase event though; if your database is dropped, you'll get an invalidate instead as your database has gone.

What Next?

Depending on your tracking needs, you can follow a single collection's changes, changes in a database's collections, or all changes in a deployment's databases and collections. There are some changes you won't see explicitly though; the creation of new collections and databases has to be inferred from the creation of documents in a collection.

This isn't a big problem when replicating to another MongoDB as database and collection creation is inferred when needed. If you need to spot newly created databases and collections, you'll want to make a copy of the current set of collections and check if a new change is affecting a previously unmentioned collection. The other thing you won't see is the creation of indexes and other actions, which don't reflect as changing documents.

The 4.0 changes do mean that it's easier now to watch your database and deployment activity and the feature opens up a whole new way to present your MongoDB to another system - in real-time as it changes. We recommend you give it a try today. Your next stop? The documentation for Change Streams... And if you don't have MongoDB 4.0, remember you can get a free M0 cluster to learn and experiment with on MongoDB Atlas. Just sign up here.