How Predii Rapidly Produces Predictive Analytics Tools for IoT with MongoDB

The IoT and digitalization initiatives are transforming manufacturing and servicing in many industries. With the ability to blend the physical and digital worlds, organizations can unlock new revenue streams and drive operational efficiencies unimaginable a few years ago. Predii turned to MongoDB to build a solution that enables its customers to process huge volumes of sensor data, technical manuals and service orders and then expose it to machine learning processes and extract actionable insights.

With MongoDB they were able to build new apps faster, collect and analyze data with machine learning at high scale, and reduce system infrastructure by up to 5x. We recently spoke with Tilak Kasturi, CEO and Founder of Predii about his purpose-built machine learning platform built on top of MongoDB.

Can you tell us a bit about Predii?

Predii Inc. is a prescriptive analytics and AI software company based in Palo Alto, CA. Predii's AI platform is a purpose-built machine learning platform enabling predictive repair and maintenance for complex equipment, designed specifically to learn from service data (service orders, IOT data, technical manuals). Predii's insights significantly reduce the time needed for repairs — and also reduces the need for repair by enhancing maintenance operations. The AI platform is called Predii Repair Intelligence™ and it understands industrial equipment, the technician ecosystem, and the value inherent to servicing data. We provide enterprise solutions for the maintenance and design of mission critical assets.

The Predii formula

What challenge were you trying to take on?

This was a new project. We process and store meta-data from technical manuals and billions of IoT sensor readings while also referencing data in automotive service orders to build intelligent and responsive products from it. How do you use hundreds of millions of noisy service orders to guide technicians in specific repairs? Handling these huge amounts of data our customers have was the largest issue we had to address. Not all of the detail is valuable in analysis, but customers need to retain metadata that describes the raw content, without retaining all of the volume.

We could've taken the data and slapped it onto the database as-is, but it wouldn't have solved our issue. We needed to be smart enough to pull the data in a manner that cleansed it and removed any unnecessary attributes. We also make heavy use of the native compression methods available in MongoDB's WiredTiger storage engine. We found it to reduce data size by 3-5x (depending on data type), and its optimizations for high performance disk drives provide low latency data access.

What led you to go with MongoDB?

We previously tried MySQL, then CouchBase, and then MemSQL, but none of them fully met all the demands of our app. MongoDB was flexible enough to meet all our requirements, giving us the features we needed. MongoDB has a huge and active community of developers, so it allowed us to ask questions when needed and kept our IT troubleshooting costs to a minimum. The search and facet abilities allowed us to store the data in a way that enabled slicing and dicing the data as new use cases and applications emerged, all without ever having to modify the document structure.

The built-in features within the database also kept the custom application development requirements down and avoided the need for us to integrate 3rd party tools, such as external search engines and caching layers that would have complicated the platform and slowed down the pace of development.

With regard to scalability, the sharding capabilities in MongoDB gave us high performance data distribution. The ease of horizontal scaling allowed us to build higher availability into our system in order to meet customer product support requirements inherent to our industry.

How are you measuring the impact of MongoDB on your business?

The increased productivity of our team has helped our engineers streamline the POC process. Many AI engagements take 6-12+ months to complete a pilot, and we are able to compress that to just two months with MongoDB because we don't have to create new data structures and database schema. Since we were able to achieve up to 5x data compression using MongoDB we saved money on infrastructure costs, and as a nimble team, were able to leverage the built in features of MongoDB that streamlined made engineering efforts.

Where is the Predii platform deployed?

The Predii platform can be deployed by our customers on-premise or in the cloud — we use Microsoft Azure, but are flexible. MongoDB gives us complete infrastructure agility with the freedom to run anywhere.

Can you give us an idea of the structure and size of your MongoDB deployments?

The Predii Platform

Our deployments vary in size from 100GB to 10+TB. When our data size gets larger we leverage sharding. For example we have one deployment on 3 shards with 5 TB of data compressed down to 1 TB. This deployment has a processing environment and a delivery environment, each with its own set of shards and replicas. From a throughput standpoint, we see thousands of operations a second with a workload profile that is write heavy. We also use the MongoDB Spark Connector.

How do you use Machine Learning and AI in the Predii platform?

All of the raw data, models, and processed data are all stored in MongoDB. We use both Java and Python and leverage homegrown proprietary algorithms and others like Stanford NLP (Natural Language Processing) for statistical, deep learning NLP, and rule-based NLP.

Data from algorithms are indexed for query performance. The core of our Predii Repair Intelligence™ pulls from these data models in order to provide insights such as diagnostic guidance, repair procedures, suggested parts and labor operations, equipment durability, geo-specific analytics, etc.

What makes MongoDB particularly well-suited for your machine learning use case?

We reform the data into JSON objects coming from unstructured service orders and line items, XML formatted OEM manuals, parts transaction data, and name-value-pair log entries. We found that the MongoDB aggregation pipeline and indexing options were very effective in querying our data model.

We could easily aggregate sensor data and extract data to create a model based off of standardized symptom/failure/resolution phrases, codes, and labor operations. These features in conjunction with WiredTiger compression, in-memory caching, and sharding capabilities provided the most competitive resolution to the challenges involved in high performance data retrieval and low cost storage were some of the factors that led to our decision to use MongoDB.

If someone wants more information about the Predii platform, where should they go?

They should visit our use cases page at:

Thank you for taking the time to talk with us today about MongoDB. For those interested in learning more about AI with mongodb to download our Deep learning and the artificial Intelligence revolution whitepaper.

For more information about MongoDB and the Internet of Things check out our MongoDB Internet of Things use case page.

Leaf in the Wild posts highlight real world MongoDB deployments. Read other stories about how companies are using MongoDB for their mission-critical projects.