Real-Time Inventory Tracking with Computer Vision & MongoDB Atlas

Karolina Ruiz Rogelj and Arnaldo Vera
August 1, 2023
#Atlas

In today’s rapidly evolving manufacturing landscape, digital twins of factory processes have emerged as a game-changing technology. But why are they so important? Digital twins serve as virtual replicas of physical manufacturing processes, allowing organizations to simulate and analyze their operations in a virtual environment. By incorporating artificial intelligence and machine learning, organizations can interpret and classify objects, leading to cost reductions, faster throughput speeds, and improved quality levels. Real-time data, especially inventory information, plays a crucial role in these virtual factories, providing up-to-the-minute insights for accurate simulations and dynamic adjustments.

In the first blog, we covered a 5-step high level plan to create a virtual factory. In this blog, we delve into the technical aspects of implementing a real-time computer vision inventory inference solution as seen in Figure 1 below. Our focus will be on connecting a physical factory with its digital twin using MongoDB Atlas, which facilitates real-time interaction between the physical and digital realms. Let's get started!

Diagram depicting how a physical factory with it's digital twin using MongoDB Atlas. In the first step, images are sent to MongoDB for storage and processing from the Physical factory usined AWS IoT Core. In the second step, the computer vision is used to keep track of inventory items and updating virtual factory using AWS sagemaker. In the 3rd step, Computer vision model results about inventory are sent to virtual factory via Device Sync. And finally, the virtual factory informs user of inventory status and enables/disables order placement. — Figure 1: High Level Overview

Part 1: The physical factory sends data to MongoDB Atlas

Let’s start with the first task of transmitting data from the physical factory to MongoDB Atlas. Here, we focus on sending captured images of raw material inventory from the factory to MongoDB for storage and further processing as seen in Figure 2. Using the MQTT protocol, we send images as base64 encoded strings. AWS IoT Core serves as our MQTT broker, ensuring secure and reliable image transfer from the factory to MongoDB Atlas.

Diagram depicting images being sent to MongoDB Atlas via AWS IoT Core. — Figure 2: Sending images to MongoDB Atlas via AWS IoT Core

For simplicity purposes, in this demo, we directly store the base64 encoded image strings in MongoDB documents. This is because each image received from the physical factory is small enough to fit into one document. However, this is not the only method to work with images (or generally large files) in MongoDB. Within our developer data platform, we have various storage methods, including GridFS for larger files or binary data for smaller ones (less than 16MB). Moreover, object storage services like AWS S3 or Google Cloud Storage, coupled with MongoDB data federation are commonly used in production scenarios.

In this real-world scenario, integrating object storage services with MongoDB provides a scalable and cost-efficient architecture. MongoDB is excellent for fast and scalable reads and writes of operational data, but when retrieving images with very low latency is not a priority, the storage of these large files in ‘buckets’ helps reduce costs while getting all the benefits of working with MongoDB Atlas. Robert Bosch GmbH, for instance, uses this architecture for Bosch's IoT Data Storage, which helps service millions of devices worldwide efficiently.

Coming back to our use case, to facilitate communication between AWS IoT Core and MongoDB, we employ Rules defined in AWS IoT Core, which helps us send data to an HTTPS endpoint. This endpoint is configured directly in MongoDB Atlas and allows us to receive and process incoming data. If you want to learn more about MongoDB Data APIs, check this blog from our Developer Center colleagues.

Part 2: MongoDB Atlas to AWS SageMaker for CV prediction

Now it’s time for the inference part! We’ve trained a built-in multi-label classification model provided by Sagemaker, using images like in Figure 3. The images were annotated with using an .lst file format following the schema:

So in an image where only the red and white pieces are present, but no blue is present in the warehouse, we would have an annotation such as:

Photo of a physical factory piece. — Figure 3: Sample image used for the Computer Vision model

The model was built using 24 training images and 8 validation images, which was a simplicity-based decision to demonstrate the capabilities of the implementation rather than building a powerful model. Regardless of the extremely low training/validation sample, we managed to achieve a 0.97 validation accuracy. If you want to learn more about how the model was built, check out the Github repo.

With a model trained and ready to predict, we created a model endpoint in Sagemaker where we send new images through a POST request so it answers back with the predicted values.

We use an Atlas Function to drive this functionality. Every minute, it grabs the latest image stored in MongoDB and sends it to the Sagemaker endpoint. It then waits for the response. When the response is received, we get an array with three decimal values between 0 and 1 representing the likelihood of each piece (blue, red, white) being in the stock. We interpret the numeric values with a simple rule: if the value is above 0.85, we consider the piece being in stock. Finally, the same Atlas function writes the results in a collection (Figure 4) that keeps the current state of the inventory of the physical factory. More details about the function here.

Screenshot of collection storing real time stock status of the factory in the AWS Sagemaker Stock Inference. — Figure 4: Collection storing real time stock status of the factory

The beauty comes when we have MongoDB Realm incorporated on the Virtual Factory as seen in Figure 5. It’s automatically and seamlessly synced with MongoDB Atlas through Device Sync. The moment we update the collection with the inventory status of the physical factory in MongoDB Atlas, the virtual factory, with Realm, is automatically updated. The advantage here, besides not needing to include any additional lines of code for the data transfer, is that conflict resolution will be handled out of the box and when connection is lost, the data won’t be lost and rather updated as soon as the connection is re-established. This essentially enables a real-time synchronized digital twin without the hustle of managing data pipelines, configuring your code for edge cases and lose time in non-competitive work.

Diagram of connecting Atlas and Realm via device sync. — Figure 5: Connecting Atlas and Realm via Device Sync

Just as an example of how companies are implementing Realm and Device Sync for mission-critical applications: The airline Cathay Pacific revolutionized how pilots logged critical flight data such as wind speed, elevation, and oil pressure. Historically, it was done manually via pen and paper until they switched to a fully digital, tablet-based app with MongoDB, Realm, and Device Sync. With this, they eliminated all papers from flights and did one of the first zero-paper flights in the world in 2019. Check out the full article here.

As you can see, the combination of these technologies is what enables the development of truly connected, highly performant digital twins within just one platform.

Part 3: CV results are sent to Digital Twin via Device Sync

In the process of sending data to the digital twin through device sync, developers can follow a straightforward procedure. First, we need to navigate to Atlas and access the Realm SDK section. Here, they can choose their preferred programming language and the data models will be automatically pre-built based on the schemas defined in the MongoDB collections. MongoDB Atlas simplifies this task by offering a copy-paste functionality as seen in Figure 6 , eliminating the need to construct data models from scratch. For this specific project, the C# SDK was utilized. However, developers have the flexibility to select from various SDK options, including Kotlin, C++, Flutter, and more, depending on their preferences and project requirements. Once the data models are in place, simply activating device sync completes the setup. This enables seamless bidirectional communication. Developers can now send data to their digital twin effortlessly.

Screenshot of the Ream C# SDK Object Model — Figure 6: Realm C# SDK Object Model example

One of the key advantages of using device sync is its built-in conflict resolution capability. Whether facing offline interruptions or any conflicting changes, MongoDB Atlas automatically manages conflict resolution. The "Always on '' feature is particularly crucial for Digital Twins, ensuring constant synchronization between the device and the MongoDB Atlas. This powerful feature saves developers significant time that would otherwise be spent on building custom conflict resolution mechanisms, error-handling functions, and connection-handling methods. With device sync handling conflict resolution out of the box, developers can focus on building and improving their applications. They can be confident in the seamless synchronization of data between the digital twin and MongoDB Atlas.

Part 4: Virtual factory sends inventory status to the user

For this demonstration, we built the Digital Twin of our physical factory using Unity so that it can be interactive through a VR headset. With this, the user can order a piece on the physical world by interacting with the Virtual Twin, even if the user is thousands of miles away from the real factory.

In order to control the physical factory through the headset, it’s crucial that the app informs the user whether or not a piece is present in the stock, and this is where Realm and Device Sync come into play.

Example of a pop-up window informing the user that a part they are requesting is out of stock. — Figure 7: User is informed of which pieces are not in stock in real time.

In Figure 7, the user intended to order a blue piece on the Digital Twin and the app is informing that the piece is not in stock, and therefore not activating the order neither on the physical factory nor its digital twin. What’s happening behind on the backend is that the app is reading the Realm object that stores the stock status of the physical factory and deciding if the piece is orderable or not. Remember that this Realm object is in real-time sync with MongoDB Atlas, which in turn is constantly updating the stock status on the collection in Figure 4 based on Sagemaker inferences.

Conclusion

In this blog, we presented a four-part process demonstrating the integration of a virtual factory and computer vision with MongoDB Atlas. This solution enables transformative real-time inventory management for manufacturing companies. If you're interested in learning more and getting hands-on experience, feel free to explore our accompanying GitHub repository for further details and practical implementation.

← Previous

The Great Data Divide: Here's What's Hindering Your AI Goals

Organizational data is arguably the lifeblood of most digital-era companies. And yet, despite its significance and importance to the organization as a whole, the creation and subsequent management of data in most organizations are bifurcated - split between whether the data is transactional or analytical (operational vs after-the-fact & historical). Between these two worlds, a great divide exists. Like the Equator line which divides our planet into northern and southern hemispheres, many of our organizations operate with separate transactional and analytics hemispheres. Rooted in hardware and software limitations, transactional and analytics data processing workloads are run against different systems and hardware, which are run by separate teams as well. While this has been an effective strategy for managing organizational data assets for a very long time, advances in hardware and software, and the availability of cloud infrastructure, have changed that. When it comes to an organization's ability to deploy AI at scale now, we need to change this approach towards processing and managing data if we want to deliberately increase our organization's overall data processing proficiency. We’re here to suggest a different operating model for consideration, one that’s based on the collective experiences of working with over 40K data-processing customers, many of whom are leading the way when it comes to reorganizing themselves for high data proficiency, to support their AI ambitions and programs. Check out our AI resource page to learn more about building AI-powered apps with MongoDB. Treating data as a product Let’s erase the line between transactional and analytics for a moment, and instead, view the overall flow and use of information within an organization. It’s created, it’s updated, and it’s read by employees, customers, data & analytics workers, and executives. Sometimes it finds itself inside an application, sometimes it manifests itself on a month-end report. Sometimes it’s used to train and retrain machine learning models. It’s this last scenario that’s starting to reveal significant deficiencies associated with traditional methods of managing data found within many organizations. Thanks to things like mobile, cloud, and IoT, data is moving at a breakneck pace. 40 years ago, we primarily transacted in a business application and then shuttled the deltas overnight into a data warehouse. Why? Because it was simply not possible to execute analytics queries against a running transactional system. At best, the queries would time out. At worst, you would slow or halt business transactional processing, and bring the business to a stand-still. In addition, all analytics were after-the-fact. We didn’t need to try and execute analytics queries against transactional systems. A single enterprise data warehouse repository was good enough to satisfy the reporting demands the organization placed on the data. Today, however, our historical data assets are becoming ever more significant, and sometimes even within real-time transactional business processing. Insights that can be gleaned from historical data, can be fed into decision-making transactional systems, to drive better, or more efficient outcomes. Think of automated decisions and inferences. Machine learning models are now supplementing some of the data analysis and decision-making that humans have traditionally had to perform. As the benefits from these models become more commonplace within transactional business systems, it’s important that they make accurate decisions, especially in heavily regulated industries such as insurance and financial services . A machine-learning model, as such, may need to be retrained often, and many models now demand access to data that is real-time, or as near real-time as possible. It’s this hunger for data that is causing AI models to cross over the great data equator. Not being satisfied with historical data, these models are increasingly demanding to be trained and retrained on data that is as fresh from having been created or updated, as possible. When we treat our data as a product, we see it as a thing, a business entity, or a noun. A customer, a policy, a claim, etc. However, it also has characteristics like state, age, and context. Is it in motion, or is it at rest? Has it just been created? Is it in the process of being updated, or is it years old, sitting in a warehouse? For which business context is it being leveraged - a customer browsing products, or a data scientist looking for trends in past sales? Across all of these characteristics and contexts, the data itself isn’t any more or less important. It’s simply important because it’s the data. Worlds apart When we task entirely separate teams, however, to manage it - transactional vs analytics - we lose this holistic data-as-a-product perspective. Instead, we put very different lenses on, whether we’re looking at a software delivery team, vs a data engineering team supporting data scientists. The meaning of data, after it’s transacted, for example, may change once it’s landed in the enterprise data warehouse or data lake. Transformations and manipulations are applied to it as it crosses over the great data equator, sometimes creating very different instances of the data. The journey often alters it from its original ground-truth state, done so while in between being copied from a transactional database, and loaded into an analytics one. After that data lands in analytics databases and platforms, it’s often further transformed and copied into even more subsequent databases and platforms. For the past decade, most AI efforts have been executed within the analytics hemisphere. Historical data assets in our data warehouses and data lakes have been sufficient to serve experiments and even production AI use cases. The more AI becomes commonplace, however, the more we can expect that AI models will want both historical and real-time data. As such, we should be re-aligning our bifurcated transactional and analytics organizations to help them operate as efficiently as possible, to serve the right state of the data to the right consumer, for the right context. Uniting with Domain Driven Design Some of the best things that have come from software delivery organizations embracing Domain Driven Design come from aligning developers, architects, business SMEs, and scrum masters into the same team, or team of teams. A bounded context in which all the folks who care about, interact with, or manipulate the software and the data, can work together without having to cross departmental boundaries, or bureaucracy that can cause friction when trying to deliver working software. If we consider the goal of being highly proficient and effective with data, especially complicated data (data that has fast changed state and context), it stands to reason that an Agile team of teams, or Bounded Context, should include not only the business SME’s, the software developers, and the architects and site reliability engineers (SRE’s) who maintain applications, but also the data engineers and the data scientists who currently manage after-the-fact data assets, and are using it to bring AI models to life. If we truly want to embrace and treat data as a product, however, we need to eradicate the notion that data should be managed in two different hemispheres across the organization - transactional and analytics. The data will change state, often, and only continue to do so for the foreseeable future. Engineering the organization for success - efficiency, and accuracy when it comes to data processing - requires deliberateness. For that, we have to actively seek out and make our goals happen. Those goals should be focused on removing known friction points. The junctions at which the exchange, or processing of information is inefficient, struggling to scale, costing too much effort and dollars, or all of the above. All hands on deck When it comes to building sophisticated digital applications, when it comes to managing data (in whatever state or context), when it comes to building and maintaining AI models, when it comes to incorporating those models into actual business workflows and applications, it truly takes a village. As AI begins to accelerate the ability to write and deploy code, for example, the pace of application feature delivery in most organizations will increase. In short, we’re going to be expected to do more, in less time, thanks to the forthcoming generation of AI-enabling assistants. This will place even greater demands and expectations on the organization's technology and data workers, and especially the data infrastructure. Similarly, as AI models consume either real-time or historical data, our ability to accurately, efficiently, and quickly process and manage all of this data will need to increase significantly. The way forward Aligning people and resources to common goals is an effective way to transform an organization. Setting goals like treating data as a product, and embracing principles of domain-driven when it comes to an organization’s data-engineering practices, can help tremendously in moving towards more accurate, efficient, and performant data processing. In organizations we work with, large and small, this transformation is beginning, and it’s erasing the hard line that’s existed between two distinct data hemispheres in the organization. As AI becomes more significant, so do your developers, data scientists, and data engineers. We need them working together as efficiently and effectively as possible, to meet our organization’s aspirations. A way to achieve this comes from reducing the friction when it comes to working with data - for developers, data scientists, and AI models alike. We invite you to learn more about our work in insurance .

August 1, 2023

Next →

Building AI With MongoDB: Integrating Vector Search And Cohere to Build Frontier Enterprise Apps

Cohere is the leading enterprise AI platform, building large language models (LLMs) which help businesses unlock the potential of their data. Operating at the frontier of AI, Cohere’s models provide a more intuitive way for users to retrieve, summarize, and generate complex information. Cohere offers both text generation and embedding models to its customers. Enterprises running mission-critical AI workloads select Cohere because its models offer the best performance-cost tradeoff and can be deployed in production at scale. Cohere’s platform is cloud-agnostic. Their models are accessible through their own API as well as popular cloud managed services, and can be deployed on a virtual private cloud (VPC) or even on-prem to meet companies where their data is, offering the highest levels of flexibility and control. Cohere’s leading Embed 3 and Rerank 3 models can be used with MongoDB Atlas Vector Search to convert MongoDB data to vectors and build a state-of-the-art semantic search system. Search results also can be passed to Cohere’s Command R family of models for retrieval augmented generation (RAG) with citations. Check out our AI resource page to learn more about building AI-powered apps with MongoDB. A new approach to vector embeddings It is in the realm of embedding where Cohere has made a host of recent advances. Described as “AI for language understanding,” Embed is Cohere’s leading text representation language model. Cohere offers both English and multilingual embedding models, and gives users the ability to specify the type of data they are computing an embedding for (e.g., search document, search query). The result is embeddings that improve the accuracy of search results for traditional enterprise search or retrieval-augmented generation. One challenge developers faced using Embed was that documents had to be passed one by one to the model endpoint, limiting throughput when dealing with larger data sets. To address that challenge and improve developer experience, Cohere has recently announced its new Embed Jobs endpoint . Now entire data sets can be passed in one operation to the model, and embedded outputs can be more easily ingested back into your storage systems. Additionally, with only a few lines of code, Rerank 3 can be added at the final stage of search systems to improve accuracy. It also works across 100+ languages and offers uniquely high accuracy on complex data such as JSON, code, and tabular structure. This is particularly useful for developers who rely on legacy dense retrieval systems. Demonstrating how developers can exploit this new endpoint, we have published the How to use Cohere embeddings and rerank modules with MongoDB Atlas tutorial . Readers will learn how to store, index, and search the embeddings from Cohere. They will also learn how to use the Cohere Rerank model to provide a powerful semantic boost to the quality of keyword and vector search results. Figure 1: Illustrating the embedding generation and search workflow shown in the tutorial Why MongoDB Atlas and Cohere? MongoDB Atlas provides a proven OLTP database handling high read and write throughput backed by transactional guarantees. Pairing these capabilities with Cohere’s batch embeddings is massively valuable to developers building sophisticated gen AI apps. Developers can be confident that Atlas Vector Search will handle high scale vector ingestion, making embeddings immediately available for accurate and reliable semantic search and RAG. Increasing the speed of experimentation, developers and data scientists can configure separate vector search indexes side by side to compare the performance of different parameters used in the creation of vector embeddings. In addition to batch embeddings, Atlas Triggers can also be used to embed new or updated source content in real time, as illustrated in the Cohere workflow shown in Figure 2. Figure 2: MongoDB Atlas Vector Search supports Cohere’s batch and real time workflows. (Image courtesy of Cohere) Supporting both batch and real-time embeddings from Cohere makes MongoDB Atlas well suited to highly dynamic gen AI-powered apps that need to be grounded in live, operational data. Developers can use MongoDB’s expressive query API to pre-filter query predicates against metadata, making it much faster to access and retrieve the more relevant vector embeddings. The unification and synchronization of source application data, metadata, and vector embeddings in a single platform, accessed by a single API, makes building gen AI apps faster, with lower cost and complexity. Those apps can be layered on top of the secure, resilient, and mature MongoDB Atlas developer data platform that is used today by over 45,000 customers spanning startups to enterprises and governments handling mission-critical workloads. What's next? To start your journey into gen AI and Atlas Vector Search, review our 10-minute Learning Byte . In the video, you’ll learn about use cases, benefits, and how to get started using Atlas Vector Search.

April 25, 2024