Combine Data from Multiple Sources
In practice, we usually collect data from multiple sources. In our machine-monitoring example, we may have a sensor that collects data about the temperature of the machine, a sensor that collects data about the pressure of the machine, and a sensor that collects data about the humidity of the machine. To perform a complete analysis, we need to combine the data from all of these sources. The second part of the real-time analytics process is the convergence of data from multiple sources.
In many cases, this results in using slow ETL (extract, transform, and load) processes or custom-built pipelines for converging data. These solutions are costly, difficult to maintain, and cause delays in the real-time analytics process. Moreover, adding new data sources can be frustrating and difficult to manage. MongoDB allows you to run aggregation queries in place. With the MongoDB aggregation framework you can perform intricate analytics and generate pre-aggregated reports in real-time.
Another important caveat with real-time analytics is that to create a more complete analysis, you need to combine your transactional (current) data with analytical (recent and historical) data. As we mentioned earlier, data is generated at rapid rates and in large volumes. A reasonable approach is to extract insights from the transactional data and then move it to a cheaper storage. However, querying data from these cheap storages is slower and somewhat limited. This can be an obstacle for real-time analytics.
How can we combine transactional and historical data to create a more complete analysis while also keeping the costs low? This is where solutions like the MongoDB Atlas Online Archive come in. With the Online Archive, you can automatically archive aged data, while also being able to query it in real time.
Combining current and historical data leads to more complete real time analysis