Event
{Event}  Tune in on September 26 starting at 10 a.m. BST to hear all the latest product updates and announcements in the MongoDB.local London keynote >

What is a Modern Data Stack?

Data has become one of the most sought after commodities, and to many large corporations, it is the single most valuable resource. Data is so valuable that it has become integral to sustaining our current economy — becoming as necessary as oil, labor, capital, and land. Data drives so much of our day-to-day, but to understand what goes on behind the scenes, we need to first ask: What is a data stack? A data stack is a collection of various technologies that allow for raw data to be processed before it can be used. A modern data stack (MDS) consists of the specific tools that are used to organize, store, and transform data. These tools allow for the data to be taken from “inedible data” (data that cannot be worked with) to “edible data” (data that can be worked with).

The concept of what data can be used for is well known: We are all aware of data breaches, social media’s reliance on personal data, data being used for artificial intelligence, etc. But what about what happens behind the scenes? How can a company take personal data collected from a website and turn it into a well targeted advertisement? For those who are not familiar with the intricacies of data, the transformation process can be seen as a blackbox. This article will help break down this process and we will focus on a crucial term worth learning about: the data stack.

The main functions of data stacks

This process can be simplified down to four steps:

  1. Data pipeline (ELT), where the data is gathered and moved into a position where it can be analyzed. This is the inedible state.
  2. Data warehouse or data lake, where the data can be properly stored.
  3. Data transformation.
  4. Data tools suite, where the data can be properly analyzed. This is the edible state.

The tools used in each company are different, but they should be easily integratable and have distinct uses. Some examples of tools are: data pipelining, data catalogs, data quality, and data lakes. Data stacks originate from technology stacks: Exactly as it sounds, technology stacks are the layers that comprise a product produced by the company. Take a web application, as an example: The necessary layers are the front-end user interface (all the HTML, CSS, and JS that make the application pretty), on top of the back-end software that actually makes the application run. A modern data stack is very similar.

Why is a data stack important?

“Time is money.” A cliché, but true, especially when it comes to a data-driven corporation. The more efficient a data stack with transforming raw data, the faster data teams can monetize it. Having the proper tools in your modern data stack is critical for your company's overall success.

Modern data stack vs legacy data stack

A legacy data stack is what came before the modern data stack. It’s an infrastructure-heavy method of preparing data for analytical use. Even though the move towards modern data stacks is gaining popularity, legacy data stacks are still vital for businesses. They hold essential company information and need to be integrated properly into your MDS. The key differences between the two are outlined below:

Legacy data stack:

  • Technically heavy
  • Requires lots of infrastructure
  • Time-consuming

Modern data stack:

  • Cloud configured
  • Ease of use
  • Suite of tools is designed with non-technical employers in mind
  • Saves time

Advantages of a modern data stack

The four main advantages of switching from an outdated stack to a modern data stack are:

Modularity

Modularity is a term to describe a way to create various products that can be separated into standalone, but integratable, components. In a data stack, this would be seen as building your stack layer by layer, including various technologies and tools that are perfect for your organization.

Speed

The modern data stack is a cloud-based solution, meaning the speed of processing data has increased exponentially. The same amount of work that took hours with a legacy data stack can now take minutes. The automation involved has also made this a faster option.

Cost

Hardware and complicated infrastructure are no longer needed in a modern data stack. This cuts costs down drastically, while allowing more authority over your data processing methods.

Time

Setting up a modern data stack can take as little as 30 minutes. Modern data stacks are also automated, meaning fewer working hours need to be involved in the data process.

Data stack use cases

As the requirement for more data storage space increased, new technologies (MongoDB being among them) found more efficient ways of dealing with data. Cloud technology skipped to the forefront of modern engineering in the early 2010s and dramatically changed Big Data forever. Amazon’s Redshift in 2012 pushed forward the modern data lake, and this truly paved the way for data optimization and transformation as we know it today. Cloud computing and storage allowed for data to now be loaded prior to being transformed (a process known as ELT: extract, load, transform) instead of its sister method (ETL: extract, transform, load), due to storage space being readily available due to cloud computing technology. Some examples of well known companies that provide proper data stack tools are: Snowflake, Google BigQuery, and AWS Redshift. These organizations help provide companies with data storage, data transformation, and the various business intelligence tools necessary to conduct data manipulation.

timeline of big data

Summary

Technology stacks are crucial for developers in every sort of corporation. These are not a new concept, and the modern data stack is an addition to what should be going on in the background of your organization. Almost every application produced in a company is born through some sort of stack pipeline.

Here at MongoDB, some of the more better known technology stacks are MEAN and MERN. These stacks are not the extent of what MongoDB can do: MongoDB even allows integration with Apache Hadoop, so complex data analytics can be conducted on data stored in your MongoDB clusters. This combination, along with important business intelligence tools, allows for a deeper analysis of your raw data.

Every data-driven organization needs a personalized modern data stack. There are a multitude of companies offering competing services with pay-as-you-go methods, so integrating an efficient and elegant data pipeline into your organization is now easier than ever before.

FAQ

What is a data analytics stack?

This is a set of technological tools that allow for data to be properly processed through the pipeline. It takes raw data straight from the source and turns it into data that can be analyzed through various business intelligence tools.

Is DBT the future?

DBT, or data build tool, is a way for data engineers to transform their data through open source commands and statements. This tool is the “transform” part of the data process pipeline. Yes, DBT is the future because it is a huge part of the modern data stack and is already changing the way data is handled.

What is reverse ETL?

ETL stands for extract, transform, load. This describes the process of how data is handled through the pipeline. ETL is how data has been processed using legacy data stacks, but its reverse form, ELT (extract, load, transform) is what is commonly used in modern data stacks. ETL was used in legacy data stacks because prior to cloud options, storage was an issue. This meant the data needed to be transformed before it was stored: taking out duplicates, ensuring the data was necessary for the company, etc., to save space and not require additional money to include more legacy infrastructure. With the rise of cloud computing options, storage is no longer an issue since companies can pay for as much or as little storage as they need. Data can now be loaded in and transformed as necessary, a process referred to as ELT.

What is a data warehouse used for?

Data warehouses store data for various business and analytical purposes. The data stored in warehouses is normally known as “historical data,” or copies of data from various sources.

What is a data lake?

A data lake is where all sorts of data, structured or not, is stored. Raw data is most commonly stored in a data lake, and data is held here until it gets transformed.