Blog
{Blog}  Join us at AWS re:Invent 2022 Nov. 28 - Dec. 2 to learn how to build the next big thing on MongoDB and AWS

Understanding What Ephemeral Storage is and the Different Types

In Kubernetes, running containers in a group of containers (called a pod) can access files stored in a folder through a volume. If one container crashes and a new one is created in the pod, depending on the type of the volume, the data that was there can still be accessible or not. Ephemeral storage, in the context of Kubernetes, is storage tied to the lifecycle of a pod, so when a pod finishes or is restarted, that storage is cleared out.

So, in short, what is ephemeral storage? It’s just temporary storage for a container that gets wiped out and lost when the container is stopped or restarted.

The age of containers

Each day, more and more workloads are moved to the cloud. Although we'll always have some workloads running on premises, the benefits of a managed solution are obvious: less downtime due to power outages or dropped Internet connections, instant scalability — both vertical (adding more resources to a computing node) and horizontal (distributing a workload among multiple nodes) — no need to pay big air conditioning bills to maintain our servers cool, etc. And especially: no need to shut down our servers and get them off the rack to upgrade RAM. Been there, done that.

When we start using computing nodes in the cloud, these can range from full-blown virtual machines (that emulate hardware and include a full guest operating system) to containers, lightweight computing nodes that share the kernel of the host machine and just add a filesystem with the OS and the applications needed. Fully managed containerized apps add to the former "hardware" benefits the "software" benefits of ease of deployment, configuration management, replicability, etc.

Containerized applications are a perfect solution for multiple problems:

  • As they contain just the files needed to run our app, and we control exactly which versions are installed, creating new running containers from an image is trivial and results are consistent: no more "this works on my machine."
  • Having these images makes it trivial to start up clusters of machines, all running the same application. Horizontal scaling is easier this way.
  • They are lightweight and make better use of the available resources.
  • Billing is way easier as we can assign a node per project, knowing exactly how much to charge a customer or what our computing costs are. Running several applications on the same machine, as we used to do on premise in the past, always raised questions about who was responsible to pay for maintenance, updates, upgrades, management of the servers, etc.

But even running in containers, these are just apps. And every interesting application (even useful ones) uses some kind of data. Containers can use data from different sources:

  • Connecting through the network, getting data using any network protocol, like HTTP or RPC.
  • Connecting to a database, like MongoDB, to manage their data.
  • Accessing data on disk.

But here lies the question: Which kind of storage does a container accept? A container can access files using volumes:

  • In our physical storage solutions, using iSCSI, NFS, or Fiber Channel.
  • Mounting volumes on cloud storage solutions. These are virtual disks that can be easily added to a container, with the benefit that it is easy to change the storage provider.
  • Temporary or ephemeral storage. Data here disappears when the pod running the containers is stopped.

Understanding what ephemeral storage is and the different types

There are different kinds of ephemeral volumes:

emptyDir

This is a volume that's empty at pod startup. Files are stored locally in the kubelet base directory (usually the root disk) or RAM. To define a volume attached to a pod that starts empty, we use this config:

apiVersion: v1
kind: Pod
metadata:
  name: empty-folder-demo
  namespace: default
spec:
  containers:
  - name: empty-dir-demo-ctr
    image: httpd:alpine
    volumeMounts:
      - mountPath: /test
        name: emptydir-test-volume
  volumes:
  - name: emptydir-test-volume
    emptyDir: {}

Note how we define the volumes as emptyDir and then reference those volumes in volumeMounts. If we restart a container in this pod, the contents will survive, but if the whole pod is migrated, all content is lost.

configMap, downwardAPI, secret

These inject different kinds of Kubernetes data into a pod. ConfigMaps are used to store non-confidential data in key-value pairs, as can be seen in the example configuration below:

apiVersion: v1
kind: ConfigMap
metadata:
  name: game-demo
data:
  # property-like keys; each key maps to a simple value
  player_initial_lives: "3"
  ui_properties_file_name: "user-interface.properties"

Secrets are used to store sensitive information, such as passwords or API keys. Secrets are stored as base64 strings in configuration files:

apiVersion: v1
kind: Secret
metadata:
  name: mysecret
type: Opaque
data:
  username: YWRtaW4=
  password: MWYyZDFlMmU2N2Rm

Generic ephemeral volumes

These can be provided by all storage drivers that also support persistent volumes.

An example of an ephemeral volume that takes up to 4GB of space would be:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    resources:
      requests:
        ephemeral-storage: "2Gi"
      limits:
        ephemeral-storage: "4Gi"

Summary

We can run apps in our in-premises servers, or in virtual machines in the cloud. The former forces us to maintain everything: hardware and software, updates, security patches, capacity planning, etc. The latter allows us to focus just on the software maintenance side. But it's still a big burden, and achieving scalability and ease of configuration replication is not easy.

Containers allow us to run any application in the cloud with the convenience of replicable configuration. For instance, we can run and manage our own MongoDB containers in our favorite cloud provider.

MongoDB Atlas is easy to use with your containerized application and makes it simple for you and your team to access your data. You can find out more about MongoDB and containers with the following resources:

FAQs

Which storage type is ephemeral?

By default, all storage used by a container is ephemeral. That is, it’ll get deleted if the container crashes and is restarted. Containers, by definition, are immutable, so their file systems cannot be changed while running. If we want to persist changes, we either need to define a non-ephemeral volume or store our data using some database. The MongoDB Kubernetes operator can be handy in this case.

How much is ephemeral storage in Kubernetes?

By default, all storage in Kubernetes is ephemeral. Only if we define volumes that are not ephemeral do we get persistent storage.

Is cloud storage ephemeral?

Cloud storage is not ephemeral. If we need to create files and store them, we can use a service that implements Elastic Block Storage like OpenEBS or any of the major EBS services in the cloud providers.

What is the difference between ephemeral storage and persistent storage?

Ephemeral storage is tied to the lifecycle of containers. When a container is stopped or restarted, all data in ephemeral storage is deleted and lost. Persistent storage, on the other hand, survives container restarts.

Are pods ephemeral?

Yes. Running pods (group of containers) are by default ephemeral. The idea is to run a service — let's say a web server using several containers. This way, we can scale requests and start more containers to handle the workload. If we create a cached file and it’s stored inside a container, when that container stops (because a problem was found, etc.) what happens to those cached files? Restarting the container will create a new one that starts from zero, and moving those files (now gone) is impossible. So, pods have to be ephemeral to allow for simple restart in case of any problems so they continue serving (in this example, web pages). And the storage inside each container is also volatile if we don't configure a persistent volume.