Note: This article originally appeared in The New Stack.
Data that powers applications and data that powers analytics typically live in separate domains in the data estate. This separation is mainly due to the fact that they serve different strategic purposes for an organization. Applications are used for engaging with customers while analytics are for insight. The two classes of workloads have different requirements—such as read and write access patterns, concurrency, and latency—therefore, organizations typically deploy purpose-built databases and duplicate data between them to satisfy the unique requirements of each use case.
As distinct as these systems are, they're also highly interdependent in today's digital economy. Application data is fed into analytics platforms where it's combined and enriched with other operational and historical data, supplemented with business intelligence (BI), machine learning (ML) and predictive analytics, and sometimes fed back to applications to deliver richer experiences. Picture, for example, an ecommerce system that segments users by demographic data and past purchases and then serves relevant recommendations when they next visit the website.
The process of moving data between the two types of systems is here to stay. But, today, that’s not enough. The current digital economy, with its seamless user experiences that customers have come to expect, requires that applications also become smarter, autonomously taking intelligent actions in real time on our behalf. Along with smarter apps, businesses want insights faster so they know what is happening “in the moment.”
To meet these demands, we can no longer rely only on copying data out of our operational systems into centralized analytics stores. Moving data takes time and creates too much separation between application events and analytical actions. Instead, analytics processing must be “shifted left” to the source of the data—to the applications themselves. We call this shift application-driven analytics. And it’s a shift that both developers and analytics teams need to be ready to embrace.
Defining required capabilities
Embracing the shift is one thing; having the capabilities to implement it is another. In this article, we break down the capabilities required to implement application-driven analytics into the following five critical questions for developers:
How do developers access the tools they need to build sophisticated analytics queries directly into their application code?
How do developers make sense of voluminous streams of time series data?
How do developers create intelligent applications that automatically react to events in real time?
How do developers combine live application data in hot database storage with aged data in cooler cloud storage to make predictions?
How can developers bring analytics into applications without compromising performance?
To take a deeper dive into app-driven analytics—including specific requirements for developers compared with data analysts and real-world success stories—download our white paper: Application-Driven Analytics.
1. How do developers access the tools they need to build sophisticated analytics queries directly into their application code?
To unlock the latent power of application data that exists across the data estate, developers rely on the ability to perform CRUD operations, sophisticated aggregations, and data transformations. The primary tool for delivering on these capabilities is an API that allows them to query data any way they need, from simple lookups to building more sophisticated data processing pipelines. Developers need that API implemented as an extension of their preferred programming language to remain "in the zone" as they work through problems in a flow state.
Alongside a powerful API, developers need a versatile query engine and indexing that returns results in the most efficient way possible. Without indexing, the database engine needs to go through each record to find a match. With indexing, the database can find relevant results faster and with less overhead.
Once developers start interacting with the database systematically, they need tools that can give them visibility into query performance so they can tune and optimize. Powerful tools like MongoDB Compass let users monitor real-time server and database metrics as well as visualize performance issues. Additionally, column-oriented representation of data can be used to power in-app visualizations and analytics on top of transactional data. Other MongoDB Atlas tools can be used to make performance recommendations, such as index and schema suggestions to further streamline database queries.
2. How do you make sense of voluminous streams of time series data?
Time series data is typical in many modern applications. Internet of Things (IoT) sensor data, financial trades, clickstreams, and logs enable businesses to surface valuable insights. To help, MongoDB developed the highly optimizedtime series collection type and clustered indexes. Built on a highly compressible columnar storage format, time series collections can reduce storage and I/O overhead by as much as 70%.
Developers need the ability to query and analyze this data across rolling time windows while filling any gaps in incoming data. They also need a way to visualize this data in real time to understand complex trends.
Another key requirement is a mechanism that automates the management of the time series data lifecycle. As data ages, it should be moved out of hot storage to avoid congestion on live systems; however, there is still value in that data, especially in aggregated form to provide historical analysis. So, organizations need a systematic way of tiering that data into low-cost object storage in order to maintain their ability to access and query that data for the insights it can surface.
3. How do you create intelligent applications that automatically react to events in real time?
Modern applications must be able to continuously analyze data in real time as they react to live events. Dynamic pricing in a ride-hailing service, recalculating delivery times in a logistics app due to changing traffic conditions, triggering a service call when a factory machine component starts to fail, or initiating a trade when stock markets move—these are just a few examples of in-app analytics that require continuous, real-time data analysis.
MongoDB Atlas has a host of capabilities to support these requirements. With change streams, for example, all database changes are published to an API, notifying subscribing applications when an event matches predefined criteria. Atlas triggers and functions can then automatically execute application code in response to the event, allowing you to build reactive, real-time, in-app analytics.
4. How do you combine live application data in hot database storage with aged data in cooler cloud storage to make predictions?
Data is increasingly distributed across different applications, microservices, and even cloud providers. Some of that data consists of newly ingested time-series measurements or orders made in your ecommerce store and resides in hot database storage. Other data sets consist of older data that might be archived in lower cost, object cloud storage.
Organizations must be able to query, blend, and analyze fresh data coming in from microservices and IoT devices along with cooler data, APIs, and third-party data sources that reside in object stores in ways not possible with regular databases. The ability to bring all key data assets together is critical for understanding trends and making predictions, whether that's handled by a human or as part of a machine learning process.
5. How can you bring analytics into your applications without compromising their performance?
Live, customer-facing applications need to serve many concurrent users while ensuring low, predictable latency and do it consistently at scale. Any slowdown degrades customer experience and drives customers toward competitors. In one frequently cited study, Amazon found that just 100 milliseconds of extra load time cost them 1% in sales. So, it's critical that analytics queries on live data don’t affect app performance.
A distributed architecture can help you enforce isolation between the transactional and analytical sides of an application within a single database cluster. You can also use sophisticated replication techniques to move data to systems that are totally isolated but look like a single system to the app.
Next steps to app-driven analytics
As application-driven analytics becomes pervasive, the MongoDB Atlas developer data platform unifies the core data services needed to make smarter apps and improved business visibility a reality.
Atlas does this by seamlessly bridging the traditional divide between transactional and analytical workloads in an elegant and integrated data architecture. With MongoDB Atlas, you get a single platform managing a common data set for both developers and analysts.
With its flexible document data model and unified query interface, the Atlas platform minimizes data movement and duplication and eliminates data silos and architectural complexity while unlocking analytics faster and at lower cost on live operational data. It does all this while meeting the most demanding requirements for resilience, scale, and data privacy.
For more information about how to implement app-driven analytics and how the MongoDB developer data platform gives you the tools needed to succeed, download our white paper, Application-Driven Analytics.