An Introduction to Change Streams

Raphael Londner and Alyson Cabral
February 6, 2018 | Updated: June 18, 2020

There is tremendous pressure for applications to immediately react to changes as they occur. As a new feature in MongoDB 3.6, change streams enable applications to stream real-time data changes by leveraging MongoDB’s underlying replication capabilities. Think powering trading applications that need to be updated in real-time as stock prices change. Or creating an IoT data pipeline that generates alarms whenever a connected vehicle moves outside of a geo-fenced area. Or updating dashboards, analytics systems, and search engines as operational data changes. The list, and the possibilities, go on, as change streams give MongoDB users easy access to real-time data changes without the complexity or risk of tailing the oplog (operation log). Any application can readily subscribe to changes and immediately react by making decisions that help the business to respond to events in real-time.

Change streams can notify your application of all writes to documents (including deletes) and provide access to all available information as changes occur, without polling that can introduce delays, incur higher overhead (due to the database being regularly checked even if nothing has changed), and lead to missed opportunities.

Characteristics of change streams

Targeted changes
Changes can be filtered to provide relevant and targeted changes to listening applications. As an example, filters can be on operation type or fields within the document.
Resumablility
Resumability was top of mind when building change streams to ensure that applications can see every change in a collection. Each change stream response includes a resume token. In cases where the connection between the application and the database is temporarily lost, the application can send the last resume token it received and change streams will pick up right where the application left off. In cases of transient network errors or elections, the driver will automatically make an attempt to reestablish a connection using its cached copy of the most recent resume token. However, to resume after application failure, the applications needs to persist the resume token, as drivers do not maintain state over application restarts.
Total ordering
MongoDB 3.6 has a global logical clock that enables the server to order all changes across a sharded cluster. Applications will always receive changes in the order they were applied to the database.
Durability
Change streams only include majority-committed changes. This means that every change seen by listening applications is durable in failure scenarios such as a new primary being elected.
Security
Change streams are secure – users are only able to create change streams on collections to which they have been granted read access.
Ease of use
Change streams are familiar – the API syntax takes advantage of the established MongoDB drivers and query language, and are independent of the underlying oplog format.
Idempotence
All changes are transformed into a format that’s safe to apply multiple times. Listening applications can use a resume token from any prior change stream event, not just the most recent one, because reapplying operations is safe and will reach the same consistent state.

An example

Let’s imagine that we run a small grocery store. We want to build an application that notifies us every time we run out of stock for an item. We want to listen for changes on our stock collection and reorder once the quantity of an item gets too low.

{	_id: 123UAWERXHZK4GYH
	product: pineapple
	quantity: 3
}

Setting up the cluster

As a distributed database, replication is a core feature of MongoDB, mirroring changes from the primary replica set member to secondary members, enabling applications to maintain availability in the event of failures or scheduled maintenance. Replication relies on the oplog (operation log). The oplog is a capped collection that records all of the most recent writes, it is used by secondary members to apply changes to their own local copy of the database. In MongoDB 3.6, change streams enable listening applications to easily leverage the same internal, efficient replication infrastructure for real-time processing.

To use change streams, we must first create a replica set. Download MongoDB 3.6 and after installing it, run the following commands to set up a simple, single-node replica set (for testing purposes).

mkdir -pv data/db 
mongod --dbpath ./data/db --replSet "rs"

Then in a separate shell tab, run: mongo

After the rs:PRIMARY> prompt appears, run: rs.initiate()

If you have any issues, check out our documentation on creating a replica set.

Seeing it in action

Now that our replica set is ready, let’s create a few products in a demo database using the following Mongo shell script:

conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs");
db = conn.getDB("demo");
collection = db.stock;

var docToInsert = {
  name: "pineapple",
  quantity: 10
};

function sleepFor(sleepDuration) {
  var now = new Date().getTime();
  while (new Date().getTime() < now + sleepDuration) {
    /* do nothing */
  }
}

function create() {
  sleepFor(1000);
  print("inserting doc...");
  docToInsert.quantity = 10 + Math.floor(Math.random() * 10);
  res = collection.insert(docToInsert);
  print(res)
}

while (true) {
  create();
}

Copy the code above into a createProducts.js text file and run it in a Terminal window with the following command: mongo createProducts.js.

Creating a change stream application

Now that we have documents being constantly added to our MongoDB database, we can create a change stream that monitors and handles changes occurring in our stock collection:

conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs");
db = conn.getDB("demo");
collection = db.stock;

const changeStreamCursor = collection.watch();

pollStream(changeStreamCursor);

//this function polls a change stream and prints out each change as it comes in
function pollStream(cursor) {
  while (!cursor.isExhausted()) {
    if (cursor.hasNext()) {
      change = cursor.next();
      print(JSON.stringify(change));
    }
  }
  pollStream(cursor);
}

By using the parameterless watch() method, this change stream will signal every write to the stock collection. In the simple example above, we’re logging the change stream's data to the console. In a real-life scenario, your listening application would do something more useful (such as replicating the data into a downstream system, sending an email notification, reordering stock...). Try inserting a document through the mongo shell and see the changes logged in the Mongo Shell.

Creating a targeted change stream

Remember that our original goal wasn’t to get notified of every single update in the stock collection, just when the inventory of each item in the stock collection falls below a certain threshold. To achieve this, we can create a more targeted change stream for updates that set the quantity of an item to a value no higher than 10. By default, update notifications in change streams only include the modified and deleted fields (i.e. the document “deltas”), but we can use the optional parameter fullDocument: "updateLookup" to include the complete document within the change stream, not just the deltas.

const changeStream = collection.watch(
  [{
    $match: {
      $and: [
        { "updateDescription.updatedFields.quantity": { $lte: 10 } },
        { operationType: "update" }
      ]
    }
  }],
  {
    fullDocument: "updateLookup"
  }
);

Note that the fullDocument property above reflects the state of the document at the time lookup was performed, not the state of the document at the exact time the update was applied. Meaning, other changes may also be reflected in the fullDocument field. Since this use case only deals with updates, it was preferable to build match filters using updateDescription.updatedFields, instead of fullDocument.

The full Mongo shell script of our filtered change stream is available below:

conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs");
db = conn.getDB("demo");
collection = db.stock;

let updateOps = {
  $match: {
    $and: [
      { "updateDescription.updatedFields.quantity": { $lte: 10 } },
      { operationType: "update" }
    ]
  }
};

const changeStreamCursor = collection.watch([updateOps]);

pollStream(changeStreamCursor);

//this function polls a change stream and prints out each change as it comes in
function pollStream(cursor) {
  while (!cursor.isExhausted()) {
    if (cursor.hasNext()) {
      change = cursor.next();
      print(JSON.stringify(change));
    }
  }
  pollStream(cursor);
}

In order to test our change stream above, let’s run the following script to set the quantity of all our current products to values less than 10:

conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs");
db = conn.getDB("demo");
collection = db.stock;
let updatedQuantity = 1;

function sleepFor(sleepDuration) {
  var now = new Date().getTime();
  while (new Date().getTime() < now + sleepDuration) {
    /* do nothing */
  }
}

function update() {
  sleepFor(1000);
  res = collection.update({quantity:{$gt:10}}, {$inc: {quantity: -Math.floor(Math.random() * 10)}}, {multi: true});
  print(res)
  updatedQuantity = res.nMatched + res.nModified;
}

while (updatedQuantity > 0) {
  update();
}

You should now see the change stream window display the update shortly after the script above updates our products in the stock collection.

Resuming a change stream

In most cases, drivers have retry logic to handle loss of connections to the MongoDB cluster (such as , timeouts, or transient network errors, or elections). In cases where our application fails and wants to resume, we can use the optional parameter resumeAfter : <resumeToken>, as shown below:

conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs");
db = conn.getDB("demo");
collection = db.stock;

const changeStreamCursor = collection.watch();
resumeStream(changeStreamCursor, true);

function resumeStream(changeStreamCursor, forceResume = false) {
  let resumeToken;
  while (!changeStreamCursor.isExhausted()) {
    if (changeStreamCursor.hasNext()) {
      change = changeStreamCursor.next();
      print(JSON.stringify(change));
      resumeToken = change._id;
      if (forceResume === true) {
        print("\r\nSimulating app failure for 10 seconds...");
        sleepFor(10000);
        changeStreamCursor.close();
        const newChangeStreamCursor = collection.watch([], {
          resumeAfter: resumeToken
        });
        print("\r\nResuming change stream with token " + JSON.stringify(resumeToken) + "\r\n");
        resumeStream(newChangeStreamCursor);
      }
    }
  }
  resumeStream(changeStreamCursor, forceResume);
}

With this resumability feature, MongoDB change streams provide at-least-once semantics. It is therefore up to the listening application to make sure that it has not already processed the change stream events. This is especially important in cases where the application’s actions are not idempotent (for instance, if each event triggers a wire transfer).

All the of shell scripts examples above are available in the following GitHub repository. You can also find similar Node.js code samples here, where a more realistic technique is used to persist the last change stream token before it is processed.

Next steps

I hope that this introduction gets you excited about the power of change streams in MongoDB 3.6.

If you want to know more:

Watch Aly’s session about Change Streams
Read the Change Streams documentation
Try out Change Streams examples in Python, Java, C, C# and Node.js
Read the What’s new in MongoDB 3.6 white paper
Take MongoDB University’s M036: New Features and Tools and Tools in MongoDB 3.6 course

If you have any question, feel free to file a ticket at https://jira.mongodb.org or connect with us through one of the social channels we use to interact with the developer community.

About the authors – Aly Cabral and Raphael Londner

Aly Cabral is a Product Manager at MongoDB. With a focus on Distributed Systems (i.e. Replication and Sharding), when she hears the word election she doesn’t think about politics. You can follow her or ask any questions on Twitter at @aly_cabral

Raphael Londner is a Principal Developer Advocate at MongoDB. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner

Get Started with MongoDB Atlas

Run MongoDB in the cloud for free with MongoDB Atlas. No credit card required.

← Previous

Push Your MongoDB Atlas Alerts to Datadog

MongoDB Atlas, the fully managed cloud database , provides customers with pre-built and customizable alerts that can easily be configured for different channels, including Slack, Hipchat, PagerDuty, Flowdock, and more. Due to popular demand, we’ve recently added Datadog as an optional endpoint for Atlas alerts. An increasing number of companies are using Datadog to monitor their entire application estate; this new integration will allow them to quickly get a sense of any database alerts from a dashboard they regularly view. Setup is simple. Select a MongoDB Atlas Project, and click on “Settings” in the left-hand menu. Scroll down to “Datadog Settings” and paste in your Datadog API key. Next, click on “Alerts” in the left-hand menu. You will see a screen that shows all alerting activity. Click on the green “Add” button in the upper right corner of your screen to create a new alert. You can now customize a new alert and specify “Datadog” as the endpoint. To send an existing alert to Datadog, simply click on “Alert Settings” in the top navigation of your main Alerts screen. This will show you all of your existing alerts, and allow you to edit them using the same UI you use to create new alerts. And that’s it. You should now start seeing MongoDB Atlas alerts in Datadog. Not yet a MongoDB Atlas user? Create an account and get a free 512 MB database.

February 6, 2018

Next →

MongoDB: Powering Digital Natives

Today's rapidly evolving digital landscape is dominated by digital native companies, driving innovation . These are companies born in the digital age and who operate through digital channels with a business model enabled by technology and data. They are not only adept at using technology but are also reshaping the way software is developed and deployed. This article delves into the challenges and opportunities facing digital natives in modern application development, with a particular focus on the complexities of managing data. We’ll explore how the right data platform can empower your digital native organization to build high-quality software faster, adapt to changing market demands, and unlock the full potential of your business. Strong foundations: The four pillars of tech-fueled growth for digital natives Achieving explosive growth requires a strong foundation built on specific principles, which empower rapid scaling and success. Here, we explore the four key pillars that fuel tech-driven growth for digital natives: Product-market fit, fast: As a digital native, you must continuously ship and iterate products to achieve a quick product-market fit. This builds customer trust and captures opportunities before competitors can in an evolving market. Data and AI-driven decisions: You must leverage data to personalize experiences, automate processes, and guide product decisions. A robust data architecture feeds real-time data into AI models, enabling data-driven decisions organization-wide. Balance of freedom and control: Your developers must have the freedom to choose technologies, even as your organization maintains control over the infrastructure to manage risks and costs at scale. Selected technologies must integrate within your overall technology estate. Extensible and open technologies: You must explore disruptive technologies while maintaining existing systems. Freedom from platform and vendor lock-in enables quick adoption of innovations, from current generative AI capabilities to future technological advances. Data: The unsolved challenge in modern application development From cloud platforms and managed services to gen AI code assistants, advancements have transformed how engineering teams build, ship, and run applications: Agile methods and programmatic APIs streamline development, while CI/CD and infrastructure as code automate processes. Containerization, microservices, and serverless architectures enable modularity, while new languages and frameworks boost capabilities. Enhanced logging and monitoring tools provide deep application health insights. Figure 1: Tools and processes to maximize velocity. But none of these advancements address where developers spend most of their time— data . In fact, 73% of developers share time and again that working with data is the hardest part of building an application or feature. So why is data the problem? Traditionally, selecting a database, often an open-source relational one, is the first step in development. However, these databases can struggle with the characteristics of modern data: it’s high volume, unstructured, and constantly evolving. As applications mature and their data demands grow, development teams may encounter challenges with achieving scalability and maintaining service resilience. Some teams turn to NoSQL databases, but even then they find there are limited capabilities, pushing them back to relational databases. As the application gains traction, the business’s appetite for innovation grows, compelling development teams to incorporate an expanding array of database technologies. This results in an architectural sprawl, imposing on teams the challenges of mastering, sustaining, and harmonizing new technologies. Concurrently, the dynamic technology landscape undergoes constant evolution, demanding teams to swiftly adjust. As a result, self-contained, autonomous teams encounter these hurdles recurrently, highlighting the pressing need for streamlined solutions to mitigate complexity and enhance agility. Figure 2: The evolving tech landscape. Data sprawl: A major threat to developer productivity and business agility Data sprawl is slowing everyone down. The more systems we add, the harder it is for developers to keep up. Each new database brings its own unique language, format, and way of working. This creates a huge headache for managing everything—from buying new systems to making sure they all work together securely. It’s a constant battle to keep data accessible, consistent, and backed up across all these different platforms. Figure 3: Teams building on separate stacks leads to data sprawl and manageability issues across the organization It compromises every single one of the four outcomes your technology foundation should be providing, yielding the opposite results: Missed opportunities, lost customers: Fragmented development experiences consume time as engineers struggle with multiple technologies, frameworks, and extract, transform, and load mechanisms for duplicating data between systems. This slows down releases, degrades digital product quality, and impedes engineers from achieving product-market fit and effective competition. Flying blind: With your operational data siloed across multiple systems, you lack the data foundations necessary to use live data in shaping customer experiences or reacting to market changes. This is because you are unable to feed reliable, consistent, real-time data into your AI models to take action within the flow of the application or to provide the business with up-to-the-second visibility into operations. High attrition, high costs: Complex data architecture impacts development team culture, leading to siloed knowledge, inefficient collaboration, and decreased developer satisfaction. This complexity also consumes substantial resources in maintaining existing systems by diverting resources from new projects that are vital for business competition in new markets. Disruption from new technologies: Dependence on any one cloud provider can stifle innovation for development teams by restricting access to the latest technologies. Developers are confined to the tools and services offered by a single provider, hindering their ability to explore and integrate new, potentially more efficient, or advanced technologies. Speed: A unified developer experience for building high-quality software faster In today’s digital world, speed is king. Your customers expect seamless experiences, but clunky applications leave them frustrated. But traditional databases can be a bottleneck, struggling to keep pace with your ever-evolving data and slowing down development. The future of data is here, and it’s flexible: a data platform built for digital natives . It leverages a flexible document model, letting you store and work with your data exactly how you need it. This eliminates rigid structures and complex migrations, freeing your developers to focus on what matters—building amazing applications faster. Flexible document data models empower developers to handle today’s rapidly evolving application data ( 80%+ unstructured) that relational databases struggle with. MongoDB documents are richly typed, boosting developer productivity by eliminating the need for lengthy schema migrations when implementing new features. Developers get to use their preferred tools and languages. Through its drivers and integrations, MongoDB supports all of the most popular programming languages, frameworks, integrated development environments, and AI-code assistance tools. MongoDB scales! It starts small and scales globally. Built for elasticity and horizontal scaling, it handles massive workloads without app changes. Figure 4: A unified developer experience, integrating all necessary data services for building sophisticated modern applications Introducing MongoDB Atlas : a fully-managed cloud database built for the modern developer. It enables the integration of real-time data from devices with AI capabilities (through vector embeddings and large language models ) to personalize user experiences. Stream processing empowers constant data analysis, while in-app analytics provides real-time insights without needing separate data warehouses, all while automatically managing data movement and storage for cost-effectiveness. MongoDB Atlas simplifies database management with the following: Easy deployment via UI, API, CLI, Kubernetes, and infrastructure as code tools. Automated operations for cost-effective performance and real-time monitoring. MongoDB Atlas customer success stories: Development with speed, scale, and efficiency Delivery Hero Delivery Hero, a global leader in online food delivery, leverages MongoDB Atlas to power its rapid service. Founded in 2011, Delivery Hero now serves millions of customers in over 70 countries through brands like PedidosYa, foodpanda, and Glovo. Having replaced its legacy SQL database, Delivery Hero optimized operations and bolstered performance by using MongoDB Atlas. By leveraging MongoDB Atlas Search, Delivery Hero revolutionized its search functionality, ensuring a seamless user experience for its extensive customer base through simplified indexing and real-time data accuracy. MongoDB’s scalability has empowered Delivery Hero to manage over 100 million products in its catalog without encountering latency issues, enabling the company to expand its services while maintaining peak performance. This agility, coupled with MongoDB’s cost-effectiveness, has enabled Delivery Hero to swiftly adapt to evolving customer demands, solidifying its position in the fiercely competitive delivery market. MongoDB Atlas Search was a game changer. We ran a proof of concept and discovered how easy it is to use. We can index in one click, and because it’s a feature of MongoDB, we know data is always up-to-date and accurate. Andrii Hrachov, Principal Software Engineer, Delivery Hero Read the full customer story to learn more. Coinbase Coinbase, a prominent cryptocurrency exchange boasting 245,000 ecosystem partners and managing assets worth $273 billion , trusts MongoDB to handle its extensive data workload. As the company grew, MongoDB scaled seamlessly to accommodate the increased demand. To further improve performance in the fast-paced crypto world, Coinbase partnered with MongoDB to develop a system that significantly accelerated data transfer to reporting tools, reducing processing time from days to a mere 5-6 hours. This near real-time data access enables Coinbase to rapidly analyze trends and make informed decisions, maintaining a competitive edge in the ever-evolving crypto landscape. Watch Coinbase's full session at MongoDB.local Austin, 2024 to learn more. MongoDB: Your flexible platform for digital growth With MongoDB, you can freely explore, experiment, develop, and deploy according to your digital-native business needs. If you would like to learn more about how MongoDB can empower your digital-native business to conquer market trends, visit: Innovate With AI: The Future Enterprise Application-Driven Intelligence: Defining the Next Wave of Modern Apps AI-Driven Real-Time Pricing with MongoDB and Vertex AI

November 7, 2024