Using MongoDB, Kafka and Spark to Build Infrastructure for India’s First Affordable Smart-Homes Project
March 11, 2016
By Gautam Rege, Co-Founder of Josh Software and Co-Founder of SimplySmart Solutions
In Sheltrex, a growing community about two hours outside of Mumbai, India, we’re part of a project that will put more than 100,000 people in affordable smart homes. To make those homes truly smart we’re building infrastructure that streams data from millions of sensors in near real-time. Citizens can then access the data through a mobile application that allows them to better manage their home.
It’s a fantastic example of how technology can improve our lives, but building scalable and fast infrastructure is not simple. In this blog, I want to highlight how my team at Josh Software, one of India’s leading internet of things and web application specialists, is overcoming those challenges by using a stack of interesting data tools like Apache Kafka, Apache Spark and MongoDB.
Of the planned 20,000 homes in Sheltrex, more than 1,500 have already been completed. Many people people are already living on site. The pilot is a proving ground for a whole host of smart township technologies. From mobile connected security to smart-meters monitoring power consumption.
Along with the mobile application for individual citizens we’ve also built software that will aggregate this data for the entire community. This gives the township the ability to negotiate more competitive rates from India’s electricity providers.
To provide homeowners and the community with accurate and timely utility data means processing information from millions of sensors quickly, then storing it in a robust and efficient way. The Smart City Application communicates with our stack APIs to make business sense for residents and the township management. The entire solution is split into two “universes.”
Universe One is where we stream all the sensor data that is flooding in from the homes in real time. This could include data points like temperature or energy usage. The sensor and smart-meter data is first ingested into a messaging system powered by Kafka (an open source, high-throughput, distributed, publish-subscribe platform that can quickly process real-time data feeds at a large scale). Through Kafka the data is dropped into Spark, a large-scale data processing engine that is basically a much faster and simpler alternative to MapReduce. It’s in Spark, using Java and Python, that we do the processing and aggregation of the data - before it’s written on to our second “universe.”
Universe Two is where the smart home data is stored and accessed by the mobile application. We need something fast, flexible and robust, so we turned to MongoDB. It is the primary database for all storage, analysis and archiving of the smart home data. This includes time-series data like regular temperature information, as well as enriched metadata such as accumulated electricity costs and usage rates. To connect the analytical and operational data sets we use the MongoDB Connector for Hadoop.
We’ve found that the three technologies work well in harmony, creating a resilient, scalable and powerful big data pipeline, without the complexity inherent in other distributed streaming and database environments. Both in development, where it’s relatively simple to integrate them, and in production where the data flows smoothly between each stage.
I’ve been using MongoDB since the beginning, in fact, I’ve written a couple of books on the subject. It’s been great to see how the database itself has matured and kept adding the right features at the right time. Another big advantage for us is how much more productive MongoDB makes developers and operations staff.
The devops team is continuously delivering code to support new requirements, so they need to make things happen fast. MongoDB’s ease of use means we can accelerate our development process and get new features integrated, tested and deployed quickly.
Right now we’re operating across eight Amazon Web Services instances in the same zone. As the project expands and more citizens move into Sheltrex we expect to see huge growth. That’s why it’s been so important for us to leverage technologies that operate efficiently at scale.
So far the pilot has been incredibly successful and we’re pleased with how our infrastructure is steadily increasing it’s capacity as thousands of new homes come online. But what we’re doing in Sheltrex is only the beginning. Housing is a volume game, as more people live in smart affordable homes the greater the effect will be for the community and the environment.
I believe this type of affordable and intelligent housing should become standard across the world. Minor initial costs lead to massive efficiencies over the lifetime of the building. These are not simply monetary - consider the wasted water and electricity that we could save.
To get there it will take political will and, of course, considerable funding, but from my point of view the technology is ready to go today. By building our giant idea on modern and mature technologies like MongoDB, we’re ready to change the world.
About Josh Software & SimplySmart
Driven by enthusiasm and passion, Josh is India’s leading company in building innovative web applications working exclusively in Ruby On Rails since 2007. The company thrives only on three basic needs - disruption, innovation, and learning. It builds products for customers who are able to fulfil at least two of these needs. Details are available at www.joshsoftware.com.
Due to the diverse nature of building smart solutions for townships, Josh has incorporated another company called SimplySmart Solutions that builds and implements these solutions. As the name suggests SimplySmart Technologies relies on simple solutions for making things smarter. Details are available at www.simplysmart.tech.
Who else runs on MongoDB? Find out: