Big Data: An In-Depth Introductory Guide

Big Data means new opportunities for organizations to create business value — and extract it. MongoDB offers products and services that get you to production faster with less risk and effort.

Picture this: You watch a video on YouTube, like it, and share it with a few friends. You then purchase groceries and medicine online, and search for cool places to vacation. You open Netflix and watch your favorite web series. You pay your parents’ phone and electricity bills, and update their details on a health portal to apply for insurance. A friend calls you up to like their content on Instagram, so you log in to your account and post comments on a few of their photos. Then, you book your flight to your parents’ place for next weekend.

With all these transactions, you keep generating data and sharing personal information about yourself and people you are related to—your parents, your friends, your favorite series, your favorite travel destinations, and more.

image depicts a network of activities including online purchases, searches, and videos

As you keep transacting in various ways, the magnitude and variety of data grows at a very fast rate. And that’s just your data! Imagine the amount of data each of the 4.66 billion active internet users worldwide produces daily!

This data that is huge in Volume (size), Variety, and Velocity (speed) is known as big data. In this article, we will explore what big data is and how it is transforming businesses to help them increase revenue and improve their business strategies and processes.

What is Big Data?

We saw above that a user can generate data in various ways—from the fitness app that you use, doctor visits you schedule, or videos you watch, to the Instagram posts you like, grocery purchases you make online, games you play, vacations you book—and every transaction that you make (or cancel) generates data. More often than not, that data is analyzed by businesses to better understand their users and present them with customized content.

Big data is used in almost all major industries to streamline operations and reduce overall costs.

For example, big data in healthcare is becoming increasingly important—early detection of diseases, discovery of new drugs, and customized treatment plans for patients are all examples of big data applications in healthcare.

It’s a complex and massive undertaking to capture and analyze so much data (for example, data about thousands of patients). To perform big data analytics, data scientists require big data tools, as traditional tools and databases are not sufficient.

Types of Big Data

Structured, unstructured, and semi-structured data are all types of big data. Most of today’s big data is unstructured, including videos, photos, webpages, and multimedia content. Each type of big data requires a different set of big data tools for storage and processing:

Structured data

Structured data is stored in an organized and fixed manner in the form of tables and columns. Relational databases are well-suited to store structured data. Developers use the Structured Query Language (SQL) to process and retrieve structured data.

Here is an example of structured data, with order details of a few customers:

OrderIDCustomerIDBillAmountBillDate
ORD334567CUST00001234$25017-04-2021 17:00:56
ORD334568CUST00009856$30017-04-2021 17:00:56
ORD334569CUST00001234$10017-04-2021 17:01:57

The Order table has a reference to the CustomerID field, which refers to the customer details stored in another table called Customer.

Semi-structured data

Semi-structured data is structured but not rigid. It is not in the form of tables and columns. Some examples are data from mobile applications, emails, logs, and IoT devices. JSON and XML are common formats for semi-structured data:

{
"customerID": "CUST0001234",
"name" : "Ben Kinsley",
"address": {
    "street": "piccadilly",
    "zip" : "W1J9LL",
    "city" : "London",
    "state" : "England" 
},
"orders": [{
    "orderid":"ORD334567",
    "billamount":"$250",
    "billdate":"17-04-2021 17:00:56"
}, {
    "orderid":"ORD334569",
    "billamount":"$100",
    "billdate":"17-04-2021 17:01:57"
}]
}

The data has a more natural structure here and is easier to traverse. MongoDB is a good example of semi-structured data storage.

Multi-structured/unstructured data

Multi-structured data is raw and has varied formats. It can contain sensor data, web logs, social media data, audio files, videos and images, documents, text files, binary data, and more. This data has no particular structure and hence is categorized as unstructured data. Examples include text files, audio files, and images.

unstructured data

It is difficult to store and process unstructured data because of its varied formats. However, non-relational databases, such as MongoDB Atlas, can easily store and process various formats of big data.

The Three Vs of Big Data

Big data has three distinguishing characteristics: Volume, Velocity, and Variety. These are known as the three Vs of big data.

Volume

Data isn’t “big” unless it comes in truly massive quantities. Just one cross-country airline trip can generate 240 terabytes of flight data. IoT sensors on a single factory shop floor can produce thousands of simultaneous data feeds every day. Other common examples of big data are Twitter data feeds, webpage clickstreams, and mobile apps.

Velocity

The tremendous volume of big data means it has to be processed at lightning-fast speed to yield insights in useful timeframes. Accordingly, stock-trading software is designed to log market changes within microseconds. Internet-enabled games serve millions of users simultaneously, each of them generating several actions every second. And IoT devices stream enormous quantities of event data in real time.

Variety

Big data comes in many forms, such as text, audio, video, geospatial, and 3D, none of which can be addressed by highly formatted traditional relational databases. These older systems were designed for smaller volumes of structured data and to run on just a single server, imposing real limitations on speed and capacity. Modern big data databases such as MongoDB are engineered to readily accommodate the need for variety—not just multiple data types, but a wide range of enabling infrastructure, including scale-out storage architecture and concurrent processing environments.

Nowadays, more Vs are making it to the definition of big data, the most prominent ones being:

  • Veracity—the accuracy of big data.
  • Value—the business value gained by analyzing the big data.
  • Variability—the different data types and changes in the big data over time.
the Vs of big data are variety, velocity, and volume.

History of Big Data

Big data has come a long way since the term was coined in 1980 by sociologist Charles Tilly. Many researchers and experts anticipated an information explosion in the 21st century. In the late 1990s, analysts and researchers started talking more about what big data is and mentioning it in their research papers. In 2001, Douglas Laney, an industry analyst at Gartner, introduced the three Vs in the definition of big data—volume, velocity, and variety.

The year 2006 was another milestone with the development of Hadoop, the distributed storage and processing system. Since then, there have been constant improvements in the big data tools for analytics. By 2016, the universe had already generated more than four zettabytes of data, and by 2021 it was estimated to be about 74 zettabytes (1 zettabyte = 1 trillion gigabytes).

Big data analytics has become quite advanced today, with at least 53% of companies using big data to generate insights, save costs, and increase revenues. There are many players in the market and modern databases are evolving to get much better insights from big data.

timeline of big data

Why is Big Data Important?

Big data is used for gaining practical insights for process and revenue improvements. Big data analysis can aid in:

  • Cost optimization: Through big data analytics, companies are able to improve their business strategies, boost productivity by handling disasters before they occur, and focus more on the business rather than worrying about operational aspects, thus reducing overall cost.
  • Innovative products and services: Through big data technologies, businesses are able to understand customer preferences better, and form their marketing strategies accordingly. This enables them to develop better products and services in future.
  • Better, quicker decision-making: With the help of big data tools like Spark, Hadoop, NoSQL databases like MongoDB Atlas, visualization tools like MongoDB Charts, and others, analysts are able to get faster insights and big data solutions. This helps in quick decision-making for business.

How Big Data Works

To better understand what big data is, we should know how big data works. Here is a simple big data example:

Defining Business Goal(s)

A clothing company XYZ wants to expand its business by acquiring new users.

Data Collection and Integration

To do this:

  • They need the help of social media sites like Facebook, Instagram, Google, to understand user behavior—the posts users like, their engagement on particular pages, and so on.
  • They create a website and track events on their website, including the number of clicks and minutes a user spends on a page.
  • For the customers who browsed a particular section (like women’s ethnic wear), XYZ wants to send customized emails giving them offers and discounts.
  • For queries and support, XYZ has chatbots and customer support available.

All of this information cannot be collected from a single source. Each step has its own data center where the information goes. The data collected from various sources should be combined in one place to get a unified view. Such a place is commonly referred to as a data lake or data warehouse. The process of collecting and combining data from various sources is called data integration.

Data Management

Next, XYZ has to store all the above data in a reliable and highly available environment, where it can be easily retrieved for business use. XYZ finds out that most companies prefer cloud-based storage so that the infrastructure can be easily managed. One such cloud-based data storage solution is MongoDB Atlas, which offers flexibility and scalability, among other features, and is also compatible with major cloud providers like AWS, Azure, and more. Data can be easily updated and governed with big data cloud storage.

The process of storing the integrated data, so that it can be retrieved by applications as required, is called data management.

Data Analysis

Once XYZ knows that the big data is managed well, the next step is to figure out how the data should be put to use to get the maximum insights. The process of big data analytics involves transforming data, building machine learning and deep learning models, and visualizing data to get insights and communicate them to stakeholders. This step is known as data analysis.

Let’s summarize how big data works:

Company XYZ big data exampleMapping to big data processName of the big data analytics stageBig data tools
XYZ wants to acquire new customersDefine business goalsProblem definition and understanding user needs: Why do we want to go for big data analytics?Interviews, research data, web logs, demographics, mobile data, emails
XYZ finds out multiple ways to ingest dataKnow where data can be sourced from and consolidateData collection, Ingestion and Integration from IoT, social media, cloud, etc.Kafka, NIFI, Kinesis, MongoDB data lake
XYZ finds out about cloud storageStore big data, keep data updatedData managementAWS, MS Master Data Services, Talend, MongoDB Atlas, Google Cloud
XYZ hires data analysts and data scientists to get insightsAnalyze big dataData visualization and analysisSpark, SAS, MongoDB Charts, R, Python, Power BI

Learn more about MongoDB Data Lake, MongoDB Atlas, or MongoDB Charts.

This enables companies like XYZ to make data-driven decisions to create intelligent organizations. Big data is the key to building a competitive, highly performant environment which can benefit businesses and customers alike.

data evaluation process

MongoDB can help at each stage of big data analytics with its host of tools like MongoDB Atlas, MongoDB Data Lake, and MongoDB Charts.

MongoDB Atlas is a fully managed cloud-based database service. Atlas takes care of complete database management, including security, reliability, optimal performance, and more, so that developers can focus on building the application logic.

Big Data Challenges

Collecting, storing, and processing big data comes with its own set of challenges:

  • Big data is growing exponentially, and existing data management solutions have to be constantly updated to cope with the three Vs.
  • Organizations do not have enough skilled data professionals who can understand and work with big data and big data tools

Learn more about the top seven big data challenges.

Big Data Examples and Use Cases

Before we get into domain-specific big data examples, let’s first understand what big data is commonly used for.

What is big data used for?

Big data can address a range of business activities from customer experience to analytics. Here are some examples:

  • Compliance and fraud protection: Big data lets you identify usage patterns associated with fraud and parse through large quantities of information much faster, speeding and simplifying regulatory reporting.
  • Machine learning: Big data is a key enabler for algorithms that teach machines and software how to learn from their own experience, so they can perform faster, achieve higher precision, and discover new and unexpected insights.
  • Product development: Companies analyze and model a range of big data inputs to forecast customer demand and make predictions as to what kinds of new products and attributes are most likely to suit them.
  • Predictive maintenance: Using sophisticated algorithms, manufacturers assess IoT sensor inputs and other large datasets to track machine performance and uncover clues to imminent problems. The goal is determining the ideal intervals for preventive maintenance to optimize equipment operation and maximize uptime.
  • Improving productivity and minimizing costs: To hone their edge in low-margin competitive markets, manufacturers utilize big data to improve quality and output while minimizing scrap. Government agencies can employ social media to identify and monitor outbreaks of infectious diseases. Retailers routinely fine-tune campaigns, inventory SKUs, and price points by monitoring web click rates that reveal otherwise hidden changes in consumer behavior.

Big Data Examples

Enterprises and consumers are producing data at an equally high rate. The data can be used by several streaming and batch processing applications, predictive modeling, dynamic querying, machine learning, AI applications, and so on.

We touched upon big data applications in healthcare, marketing, and customer experience. Other common big data examples are:

  • Fraud detection and prevention: By identifying suspicious transactions and activities, financial institutions can identify and differentiate frauds. Real-time tracking and machine learning algorithms help in detection and prevention of cyber thefts, insurance scams, identity thefts, and many other online frauds.
  • Recommendation systems: Apps like Netflix and Amazon Prime have now become the primary source of at-home entertainment. These sites recommend programs that are similar to the previous videos that they or other users liked. Amazon product recommendations work on the same principle.

Check out nine more real-world big data examples and use cases.

How Does Big Data Work in MongoDB Atlas?

As we have seen earlier, MongoDB has a document-based structure, which is a more natural way to store unstructured data. Its flexible schema accepts data in any form and volume—so you don't have to worry about storage as the amount of data increases.

MongoDB Atlas is a secure, highly available, fully managed database-as-a-service that is compatible with all the major cloud providers like AWS, Microsoft Azure, and GCP. MongoDB Atlas is highly scalable, and has built-in tools like charts for advanced big data analytics and insights. Atlas’ data lake allows users to query for data in any format on the cloud (currently supported for AWS).

Learn more about MongoDB Atlas.

Ready to learn more?

Launch a new cluster or migrate to MongoDB Atlas with zero downtime.

FAQ

What is an example of big data?

Big data is used in almost every business domain, like healthcare, logistics, retail, manufacturing, and so on. For example, big data in healthcare finds much use in new drug discovery, disease research, early detection of diseases, personalized patient care, efforts towards fewer doctor visits, and more.

What are big data tools?

Big data tools are used to collect, transform, and analyze big data since traditional tools and relational databases are no longer enough to handle it. Some of the top big data tools are:

  • Apache Spark: Spark is an open-source framework mainly popular for processing streaming data. It can process large amounts of real-time data very quickly because of in-memory calculations.
  • MongoDB: MongoDB is a NoSQL database. It has a flexible schema. MongoDB stores huge amounts of data in a naturally traversable format, making it a good choice to store, query, and analyze big data. MongoDB Atlas provides database as a fully managed service, with features like data encryption, security, advanced analytics, and data lakes.
  • Apache Hadoop: The framework that changed the way big data analytics was viewed, Hadoop is still widely used for faster batch processing of data.
  • Kafka: Kafka is an open-source framework that can handle huge volumes of events. It offers high throughput to systems and has high fault tolerance. Kafka is used for stream processing, event sourcing, and building activity tracking pipelines.
  • R: R is a popular big data statistical tool that can perform advanced statistical analytics. R provides advanced graphs and charting features for easy visualization of data.

What is big data and how is it used?

Big data refers to data that is huge in Volume, Variety, and Velocity. It contains both structured and unstructured data, which can mean anything, including customer order details, video files, audio messages, documents, social media engagements, patient and healthcare data. Big data is used by businesses to improve their internal operations as well as products and services, in order to serve consumers better. Big data is used in healthcare for research, early detection of diseases, keeping track of patient health etc.

Where is big data stored?

Traditional approaches to storing data in relational databases, data silos, and data centers are no longer sufficient due to the size and variety of today’s data. Nowadays, cloud-based systems, data lakes, and data warehouses are becoming popular options to store, integrate, and process big data. MongoDB Atlas is a good example of database as a service. Atlas is compatible with major cloud providers and offers high security, flexibility, data availability, and other important features to easily store and manage big data.

How is big data collected?

Big data is collected from different offline and online channels. It can be generated by:

  • Interviews, documents, surveys, audio, videos, or social media posts.
  • IoT devices and sensors.
  • Network logs, server logs, web logs.
  • Web scraping and search engine results.
  • Virtual assistants like Alexa, Cortana, or Siri.
  • Mobile apps, real-time data from streaming apps like Netflix or YouTube.
  • Online transactions and purchases.
  • Location data from vehicles, human movement, or satellites

What do you mean by big data?

The data that is big in volume, contains a lot of variety, and comes with high velocity constitutes big data. Big data should also have high veracity and provide value for businesses.

  • Volume—Big data is huge in size. Businesses and consumers will generate about 180 zettabytes of data by 2025, which is more than double the amount of data (64.2 zettabytes) generated in 2020.
  • Velocity—Big data comes at high speed, like real-time data that needs immediate analysis and action. An ATM transaction is one common example of this. Every transaction should be immediately reflected in the user’s account as well as the ATM system to keep track of cash availability. Each transaction also needs to be checked for authenticity right then and there.
  • Variety—Data comes in all types of formats: —unstructured, structured, or semi-structured. For structured data, like a purchase order made by a customer, relational databases are sufficient. However, unstructured and semi-structured data, which are more common forms of big data, require specialized big data tools to store and process.
  • Value—Big data results in big data analytics which leads to insights and action. This provides business value and helps in increasing overall revenue and growth of the business.
  • Veracity—Veracity includes not just the quality of data, but also the truthfulness of the data source. One such example is social media content—from user profiles to sentiments to trends, everything can change very fast.

Who is using big data?

Almost all industries use big data in some way. This includes:

  • Big data in healthcare: Electronic Medical Records (EMRs) help in tracking patient and hospital records, detecting diseases in early stages, discovering new drugs, supporting biomedical research, and monitoring health through IoT devices.
  • Big data in banking and finance: Big data is used for fraud detection and prevention, identifying loyal customers, providing better security, and more.
  • Big data in marketing and retail: To understand customer behavior, support customer segmentation, recommend products and services, and provide targeted marketing, retailers and other marketers often turn to big data analysis.

Big data also has applications in manufacturing, logistics, insurance, education, entertainment, and many other sectors.

What is big data, in simple terms?

Big data is the raw data obtained from multiple sources to get business insights. This data is huge in volume, comes in a variety of forms (like videos or images), and arrives in high volume (like streaming data).