MongoDB Atlas as the Data Hub for Smart Manufacturing with Microsoft Azure

All the source code used in this project, along with a detailed deployment guide, is available on our public Github page.

Manufacturing companies are emerging from the pandemic with a renewed focus on digital transformation and smart factories investment. COVID-19 has heightened the need for Industrial IoT technology and innovation as consumers have moved towards online channels, forcing manufacturers to compete in a digitalized business environment.

The manufacturing ecosystem can be viewed as a multi-dimensional grouping of systems designed to support the various business units in manufacturing organizations such as operations, engineering, maintenance, and learning & development functions. Process and equipment data is generated on the shop floor from machines and systems such as SCADA and then stored in a process historian or an operational database. The data originating from shop floor devices are generally structured time series data acquired through regular polling and sampling. Historians provide fast insertion rates of time series data, with capacities that reach up to tens of thousands of PLC tags processed per second. They rely on efficient data compression engines which can either be lossy or lossless.

Traditional RDBMS storage comes packaged with the manufacturing software applications such as a Manufacturing Execution System (MES). Relational databases are traditionally common in manufacturing systems and thus the choice of database systems for these manufacturing applications are typically driven by historical preferences.

Manufacturing companies have long relied on using several databases and data warehouses to accommodate various transactional and analytical workloads. The strategy of separating operational and analytical systems has worked well so far and has caused least interference with the operational process. However this strategy will not fare well in the near future for two reasons:

  1. Manufacturers are generating high volume, variety and veracity data using advanced IIoT platforms to create a more connected product ecosystem. The growth of IIoT data has been rapid and in fact, McKinsey and Company estimates that companies will spend over $175B in IIoT and edge computing hardware by 2025. A traditional manufacturing systems setup necessitates the deployment and maintenance of several technologies including graph databases (for asset digital models and relationships) and time series databases (for time series sensor data) and leads to IT sprawl across the organization.

  2. A complex infrastructure causes latency and delays in data access which leads to non-realization of real time insights for improving manufacturing operations. To establish an infrastructure that can enable real time analytics, companies need real time access to data and information to make the right decision in time. Analytics can no longer be a separate process, it needs to be brought into the application. The applications have to be supplied with notifications and alerts instantly. This is where application-driven analytics platforms such as MongoDB Atlas come into picture.

We understand that to build smarter and faster applications, we can no longer rely on maintaining separate systems for different transactional and analytical workloads. Moving data between disparate systems takes time and energy and results in longer time to market and slower speed of innovation. Many of our customers start out using MongoDB as an operational database for both new cloud-native services as well as modernized legacy apps. More and more of these clients are now improving customer experience and speeding business insight by adopting application-driven analytics within the MongoDB Atlas platform. They use MongoDB to support use cases in real-time analytics, customer 360, internet of Things (IoT) and mobile applications across all industry sectors.

As mentioned before, Manufacturing ecosystem employs a lot of databases just to run production operations. Once IIoT solutions are added to the mix, each solution (shown in yellow in Figure 1) may come with its own database (Time Series, relational, graph etc.) and the number of databases will increase dramatically. With MongoDB Atlas, this IT sprawl can be reduced as multiple use cases can be enabled using MongoDB Atlas (Figure 2). The versatility of the document model to structure data any way the application needs, coupled with an expressive API and indexing that allows you to query data any way you want is a powerful value proposition.

The benefits of MongoDB Atlas are amplified by the platform’s versatility to address almost any workload. Atlas combines transactional processing, application-driven analytics, relevance-based search, and mobile edge computing with cloud sync. These capabilities can be applied to almost every type of modern applications being built for the digital economy by developers.

Figure 1: IT sprawl with IIoT and analytics solutions deployment in Manufacturing

Figure 2: MongoDB Atlas simplifying road to Smart Manufacturing

MongoDB and Hyperscalers leading the way for smart manufacturing

Manufacturers who are actively investing in digital transformation and IIoT are experiencing an exponential growth in data. All this data offers opportunities for new business models and digital customer experiences. To drive the right outcomes from all this data, manufacturers are setting up scalable infrastructures using Hyperscalers such as Azure, AWS and GCP. These hyperscalers offer a suite of components for efficient, scalable implementation of IIoT platforms. Companies are leveraging these accelerators to quickly build solutions, which help access, organize, and analyze previously untapped data from sensors, devices, and applications.

In this article, we are focused on how MongoDB integrates with Microsoft Azure IoT modules and acts as the digital data hub for smart manufacturing use cases. MongoDB and Microsoft have been partners since 2019, but last year it was expanded, enabling developers to build data intensive applications within the Azure marketplace and Azure portal. This enables an enhanced developer experience and allows burn down of their Microsoft Azure Consumption Commitment. The alliance got further boost when Microsoft included MongoDB as a partner in its newly launched Microsoft Intelligent Data Platform Ecosystem. MongoDB Atlas can be deployed in 35 regions in Azure and has seamless integration with most of the Azure Developer services (Azure functions, App services, ADS), Analytics services (Azure Synapse), Data Governance (Microsoft Purview), ETL (ADF) and cross cutting services (AD, KMS, AKS etc.) powering building of innovative solutions.

Example scenario: Equipment failure prediction

Imagine a manufacturing facility that has sensors installed in their Computer Numerical Control (CNC) machines measuring parameters such as temperature, torque, rotational speed and tool wear. A sensor gateway converts analog sensor data to digital values and pushes it to Azure IoT Edge which acts as a gateway between factory and the Cloud. This data is transmitted to Azure IoT Hub where the IoT Edge is registered as an end device.

Once we have the data in the IoT Hub, Azure Stream Analytics can be utilized to filter the data so that only relevant information flows into the MongoDB Atlas Cluster. The connection between Stream Analytics and MongoDB is done via an Azure Function.

This filtered sensor data inside MongoDB is used for following purposes:

  1. To provide data for machine learning model that will predict the root cause of machine failure based on sensor data.

  2. To act as a data store for prediction results that can be utilized by business intelligence tools such as PowerBI using Atlas SQL Interface.

  3. To store the trained machine learning model checkpoint in binary encoded format inside a collection.

The overall architecture is shown in Figure 3.

Figure 3: Overall architecture


The sensors in the factory are sending time series measurements to Azure IoT Hub. These sensors are measuring for multiple machines:

  • Product Type

  • Air Temperature (°C)

  • Process Temperature (°C)

  • Rotational Speed

  • Torque

  • Tool Wear (min)

IoT Hub will feed these sensor data to Azure Stream Analytics, where the data will be filtered and pushed to MongoDB Atlas time series collections. The functionality of Stream Analytics can be extended by implementing machine learning models to do real-time predictive analytics on streaming input data. The prediction results can also be stored in MongoDB in a separate collection. The sensor data contains the device_id field which helps us filter data coming from different machines. As MongoDB is a document database, we do not need to create multiple tables to store this data, in fact we can just use one collection for all the sensor data coming from various devices or machines.

Once the data is received in MongoDB, sum and mean values of sensor data will be calculated for the predefined production shift duration and the results will be pushed to MongoDB Atlas Charts for visualization. MongoDB Time series window functions are used in an aggregation pipeline to produce the desired result.

When a machine stoppage or breakdown occurs during the course of production, it may lead to downtime because the operator has to find out the cause of the failure before the machine can be put back into production. The sensor data collected from the machines can be used to train a machine learning model that can automatically predict the root cause when a failure occurs and significantly reduce the time spent on manual root cause finding on the shop floor. This can lead to increased availability of machines and thus more production time per shift.

To achieve this goal, our first task is to identify the types of failures we want to predict. We can work with the machine owners and operators to identify the most common failure types and note that down. With this important step completed, we can identify the data sources that have relevant data about that failure type. If need be, we can update the Stream Analytics filter as well. Once the right data is identified, we train a Decision Tree Classifier model in Azure Machine Learning and deploy it as a binary value as a separate collection inside MongoDB.

Atlas Scheduled Triggers are used to trigger the model (via an Azure Function) and the failure prediction results are written back results into a separate Failures collection in MongoDB. Scheduled triggers’ schedule can be aligned to production schedule so that it only fires when a changeover occurs for example.

After a failure is detected, the operator and supervisor needs to be notified immediately. Using App Services, a mobile application is developed to send notifications and alerts to floor supervisor and machine operator once a failure root cause is predicted. Figure 4 shows the mobile app user interface where the user has an option to acknowledge the alert. Thanks to Atlas Device Sync, even when the mobile device is facing unreliable connectivity, the data keeps in sync between Atlas cluster and Realm database in the app. MongoDB’s Realm, is an embedded database technology already used on millions of mobile phones as part of mobile apps as well as infotainment like systems.

FIgure 4: Alert app user interface

Business benefits of using MongoDB Atlas as smart manufacturing data hub

  1. Scalability: MongoDB is a highly scalable document based database that can handle large amounts of structured, semi-structured and unstructured data. Native time series collections are available that help with storing large amounts of data generated by IIoT enabled equipment in a highly compressed manner.

  2. Flexibility: MongoDB stores data in a flexible, JSON-like format, which makes it easy to store and query data in a variety of ways. This flexibility makes it well-suited for handling the different data structures needed to store sensor data, ML models and prediction results, all in one database. This removes the need for maintaining separate databases for each type of data reducing IT sprawl in manufacturing organizations.

  3. Real-time Analytics: As sensor data comes in, MongoDB aggregation pipelines can help in generating features to be used for machine learning models. Atlas Charts can be set up in minutes to visualize important features and their trends in near real time.

  4. BI Analytics: Analysts can use the Atlas SQL interface to access MongoDB data from SQL based tools. This allows them to work with rich, multi-structured documents without defining a schema or flattening data. In a connected factory setting, this can be useful to generate reports for failures over a period of time and comparison between different equipment failures types. Data can be blended from MongoDB along with other sources of data to provide a 360 degree view of production operations.

  5. Faster Mobile Application Development: Atlas device sync bidirectionally connects and synchronizes Realm databases inside mobile applications with the MongoDB Atlas backend, leading to faster mobile application development and less time needed for maintenance of deployed applications.


The MongoDB Atlas developer data platform is designed and engineered to help speed up your journey towards smart manufacturing. It is not just suitable for high speed time series workloads but also for workloads that power mobile applications and BI Dashboards – leading to smarter applications, increased productivity and eventually smarter factories.

Learn more

  • All the source code used in this project, along with a detailed deployment guide, is available on our public Github page.

  • To learn more about how MongoDB enables IIoT for our customers, please visit our IIoT use cases page.