BlogAnnounced at MongoDB.local NYC 2024: A recap of all announcements and updates — Learn more >

Introduction to Convolutional Neural Networks

Deep learning and machine learning have proven themselves to be highly effective tools for delivering vast amounts of information quickly, thanks in large part to training data through convolutional neural networks (CNNs). These networks are the core building block of many of today’s applications, including deep learning, image recognition, autonomous vehicles, and even medical diagnostics.

Table of contents

What is a convolutional neural network?

CNNs are a specific type of advanced deep learning network that have become the method of choice for many machine learning tasks. Standard neural networks are like straightforward calculators; they look at data and follow a set pattern every time to process it. CNNs are more like detectives. They don’t just look at the data; they explore the bigger picture, understanding how each piece of data relates to others around it.

This relational recognition makes CNNs great at noticing patterns and details in a way that regular networks can’t. Also, CNNs remember the data they’ve seen before and how it was arranged, which helps their systems make better sense of new data. Memory of past data makes CNNs especially good for tasks where understanding the context or the relationship between different parts of the data is critical, such as recognizing objects in pictures.

CNNs in action

Consider a master chef tasting a complex dish, aiming to identify the myriad of individual flavors that compose the culinary masterpiece. The chef doesn’t just experience the dish as a whole; a true master also focuses on the subtle notes of each ingredient: the hint of rosemary, the zest of lemon, or the richness of saffron.

How a chef experiences food is reminiscent of how a CNN processes information. It doesn’t just see an image as a collection of pixels, it discerns the nuances. It observes the gradient of light, the curve of a line, and the texture of a surface.

When it comes to audio files, CNN distinguishes the layers of sound, separating the tones, rhythms, and harmonics.

Just like any chef worth his salt, a CCN breaks down complex inputs into their most elemental parts in order to understand and classify the data with precision, and extract each part for later use.

What are CNNs used for?

We interact with CNNs throughout the day – more often than you might realize. Each time you scroll through your social media feeds, which are typically based on your previous behavior, you’re interacting with CNNs. These sophisticated algorithms are the masterminds behind how Facebook identifies your friends in photos or how Google Photos organizes your memories.

What is the OpenCV OSS?

OpenCV OSS (Open Source Computer Vision Library) is a free software library that provides vast programming functions and algorithms primarily aimed at real-time image and video processing. OpenCV OSS is a go-to resource for tasks like recognizing faces in images, detecting objects in videos, or even analyzing medical scans. Its broad range of capabilities and efficient processing make it invaluable in applications where quick and accurate interpretation of visual data is crucial, such as in autonomous vehicles, security systems, and digital healthcare systems.

OpenCV OSS’s versatility and robustness make it an essential toolkit for developers and researchers working on projects that require sophisticated image and video analysis, including some of the applications listed below.

Image classification

Image classification is a cornerstone task in the realm of computer vision, where the system is trained to recognize and categorize images into a set of predefined classes or groups. Training the model is like teaching a child to sort objects: apples from oranges, cats from dogs, or cars from trucks. Once trained, the CNN places individual pieces of image data in each category.

Object detection

Object detection is a sophisticated capability that extends beyond the broad strokes of image classification. While image classification tells us what objects are present in an image, object detection takes it a step further by pinpointing exactly where each object is located in the image and what its boundaries are. It’s like not only identifying that there’s fruit on a table but also outlining each piece to show there’s an apple sitting between grapes and bananas.

Common applications of object detection in the real world include facial recognition systems, self-driving cars, automated construction equipment, video surveillance, and wildlife monitoring, to name just a few.

Facial recognition

This process involves analyzing the unique features of a person’s face, such as the jawline shape, space between the eyes, curve of lips, angle of cheekbones, and size of forehead. It’s a biometric process similar to a digital fingerprint, where each face has its own set of distinguishable landmarks and characteristics.

Medical image analysis

In the realm of healthcare, medical image analysis is an area where CNNs are making a profound impact. The technology assists doctors in diagnosing diseases by meticulously analyzing medical scans such as X-rays, MRIs, and CT scans.

The process begins by training a CNN on using medical image data, where the condition of interest has been labeled by medical professionals. Once trained, the CNN may be able to detect the presence of diseases like tumors, fractures, and tissue abnormalities.

Video analysis

Unlike static image analysis, video analysis involves the added dimension of time, which requires the CNN to not only recognize individual frames but also to track the changes and movements across those frames. It’s transforming the way we interpret and use video content across a range of applications.

Security surveillance

In security surveillance, CNNs monitor and flag unusual activities as well as unauthorized entry, enhancing safety and response times. For activity recognition, CNNs can discern between different types of movements.


CNNs can analyze traffic patterns, aiding in congestion reduction, urban planning, and infrastructure management. In the retail sector, insights into customer behavior gleaned from video analytics can influence store layouts and product placements for better sales outcomes.

Natural language processing

In natural language processing (NLP) systems, CNNs are particularly effective for tasks like sentence classification, sentiment analysis, and other language-related tasks where spatial relationships need to be used. This adaptability showcases the versatility of CNNs, extending their application beyond visual contexts to understanding and interpreting language-based data.

Why is it called a convolutional neural network?

It’s called a convolutional neural network because it uses mathematical operations called convolutions, which are a specialized kind of linear operation.

The structure of CNNs: peeling back the layers

CNNs consist of sections, or layers, that each handle specific aspects of data processing. But how exactly is a CNN structured to achieve this type of result? Let’s unravel the hidden layers that constitute this powerful neural network.

What are the 4 layers of CNN?

To fully grasp the workings of a CNN, let’s delve into the four key layers that define its architecture and functionality: convolutional layer, activation layer, pooling layer, and fully connected layer.

Convolutional layers

The convolutional layer in a CNN is the first step in understanding an image or other data type. The layer itself is composed of filters. Imagine using a magnifying glass to intricately examine a picture. As you guide the glass over the image, it brings into focus critical elements like lines, colors, textures, and boundaries. Filters recognize specific elements (e.g. one for circles, one for straight lines) in order to create a feature map that highlights key features of the data, enabling later layers to detect and recognize patterns, from simple shapes to complex textures.

Feature maps are helpful because they break down images into simpler parts. Instead of seeing the whole picture, the CNN now has a collection of maps that show where all these different patterns and textures are, which makes it easier for the CNN to understand and analyze the image in more detail later on.

Pooling layers

The pooling layer’s role is to shrink the width and height of the feature maps. This pooling operation is done by down-sampling, which involves summarizing the features in a certain area of the pooling layer.

Fully connected layers (dense layers)

After convolutional layers and pooling layers do their parts, fully connected layers connect neurons to all activations from the previous layers. A fully connected layer is similar to a traditional neural network and is typically placed near the end of a CNN.

The role of dense layers is to take the high-level filtered information from the convolutional and pooling layers and use it for classifying the input image into various categories. This final layer is where the network combines all the learned features to make a final decision, like identifying the object in an image.


The convolutional neural network is a cornerstone of modern AI and deep learning models, with its ability to parse, interpret, and understand vast amounts of visual data. CNNs have opened doors to innovations that seemed like science fiction a decade ago. So next time your car brakes automatically for a pedestrian, or your phone unlocks with your face, remember the unsung hero – the convolutional neural network.

Get Started With MongoDB Atlas

Try Free