In today’s competitive world, software applications need to be fast and responsive. With more people going digital, accommodating high loads on a particular application is imperative. Applications crashing due to high load is mostly a thing of the past, largely thanks to containers. Containers allow many instances of a given application service to be much more easily deployed quickly, meaning services can be scaled more dynamically in response to greater load. For larger applications, where many containers are needed, container orchestration tools play a major role in managing the overall lifecycle of containers. Without intelligent and automated orchestration, managing hundreds or thousands of containers would be unfeasible and error-prone done manually.
To be able to understand container orchestration — or in simple words, “container management” — we need to understand how containers came into existence and the real-world problems they solve.
In this article, we will discuss the need for containerized applications, container orchestration, some popular container orchestration tools, and how stateful applications like MongoDB work with container orchestration tools to deliver highly scalable, available, and resilient applications that also automate continuous integration and deployment of applications.
Table of contents
Let’s say you want to build an exam registration portal, with features like existing user login, form details for registration, and direct application filing as a guest. In a monolithic application, the three features will be bundled as a single service into an application, deployed onto one server and connected to one database.
There are multiple problems with this architecture:
The application has to go through extensive testing and redeployment even if a small change is made to any of the features.
Since the code is written as a single application and is interconnected, each part is dependent on the others to run.
Dynamic scaling is not possible for individual features.
To solve this issue, we can break the application down to small services for individual features, each having their own storage and computing power, and distributed across different servers. These small services are called microservices.
This way, each feature can work independently and can be accessed via the API gateway. Microservices are loosely coupled and can communicate with each other, but are not completely dependent.
How microservice help solve the challenges posed by monolithic applications
However, if one of the features — let’s say new registration — has more hits and needs more capacity at one time, whereas at another time, the login feature is accessed more — that cannot be managed dynamically. The host operating system allocates a fixed storage and RAM for each service which, once assigned, cannot be reallocated from one resource to another. Further, something that works in the development environment can break in the testing, staging, or even production, due to issues other than code — like version incompatibility or dependency issues. Plus, if your apps are installed on virtual machines, each of them needs a separate fixed set of host operating system, RAM, and dependencies.
Components of a virtual machine
While microservices provide a host of benefits over monolith applications, they still pose some challenges in terms of scaling, deployment, and management, due to traditional hardware.
These infrastructure and hardware can be easily solved by using containers.
Containers are light-weight packages of application and the related dependencies that are required to run the application. They are light-weight because, unlike virtual machines, they do not have their own hardware. They use the resources from the host machine. Containers are deployed using container engines like Docker or Kubernetes.
Containers do not have any resources of their own. As and when needed, a container orchestrator allocates the necessary resources to a container and deletes it when it is not in use, so that the resources like CPU and memory can be freed up for use by other containers.
Components of a container
Let’s take an example of the popular entertainment platform, Netflix. Suppose most users watch Netflix from 8pm-10pm. As the load will peak at this time, the app needs more containers to be able to service the additional number of requests. During non-peak hours, the same number of containers may not be required, hence these can be deleted.
Container images consist of the code, system libraries, tools, runtime, and other settings required to run an application. Container images are light-weight, standalone executables. The container images become containers during runtime, and a single image is often used to create multiple running instances of the container — making it incredibly easy to create many instances of the same service. Docker is a popular engine that converts container images into containers during runtime.
Creating, managing, and deleting containers manually can become tough as your application grows. Container orchestration automates container availability, provisioning, scheduling, and deployment, and it manages the complete lifecycle of a container:
Container deployment: Specify how many containers you want running at any given time.
Managing containers: Some containers might need additional configuration, which orchestration simplifies.
Resource allocation: Some containers might have access to only limited resources from the server to help flatten an otherwise spiky workload.
Scaling: As more traffic is expected, you might want to scale up your application vertically (resources) or horizontally (running copies of a given container). You can also scale down once this traffic bump is done.
Load balancing: Distribute incoming traffic among the running instances of a given container.
Networking: Containers will need to communicate with each other internally or be exposed externally.
Scheduling: Some containers might run on a specific schedule, like a cron job.
Monitoring: Monitoring needs to be done on each container to verify container health.
Resilience: Ensure different instances of a container run on different physical hardware to ensure that if one physical machine is lost, not all instances of a container are lost.
All of what a container orchestration includes
To achieve the above, developers or system administrators write a declarative configuration (example, a .yaml or .json file) that describes the desired state of the container(s) at any given point. Container orchestration platforms continuously monitor the container(s) and ensure that the desired state, as specified in the declared configuration (manifest), is consistently maintained.
What a container orchestration tool does
The most popular container orchestration tool is Kubernetes. Some other container orchestration tools are Docker Compose, Docker Swarm, and Apache Marathon (Mesos). You can find out more about using Docker, Kubernetes, and Marathon with MongoDB in our guide on enabling microservices.
Container orchestration benefits are as listed below:
By automating the deployment, scaling, and management of containerized applications, container orchestration systems make it easier to promote new code to production automatically after passing the tests. Orchestration platforms integrate well with CI tools to automate building, testing, and packaging containers as part of the CI process, thus aligning with DevOps principles.
By architecting an application built from multiple instances of the same containers, adding more containers for a given service scales capacity and throughput. Similarly, containers can be removed when demand falls. Using container orchestration frameworks further simplifies elastic scaling.
Every container running on the same host is independent and isolated from the others and the host itself. The same equipment can simultaneously host development, support, test, and production versions of your application — even running different versions of tools, languages, databases, and libraries without any risk that one environment will impact another.
By running multiple containers, redundancy can be much more easily built into the application. If one container fails, then the surviving peers continue to provide the service. With container orchestration service, failing containers can be automatically recreated, restoring full capacity and redundancy.
There are many container orchestration tools available; some of the most common are described here.
Docker Compose takes a file defining a multi-container application (including dependencies) and deploys the described application by creating the required containers. It is mainly aimed at development, testing, and staging environments.
Docker Swarm is an open-source container orchestration framework that produces a single, virtual Docker host by clustering multiple Docker hosts together. It presents the same Docker API, allowing it to integrate with any tool that works with a single Docker host.
Kubernetes was created by Google and is one of the most feature-rich and widely used open-source container orchestration platform; its key features include:
Automated deployment and replication of containers.
Vertical or horizontal scaling.
Load balancing over groups of containers.
Rolling upgrades of application containers.
Resilience, with the automated rescheduling of failed containers.
Controlled exposure of network ports to systems outside of the cluster.
Kubernetes itself has many different distributions or flavours, some free and some paid for. Some are intended for self-management on-premise or in a cloud, while many are also available as a platform as a service (PaaS) offering. Examples of well known Kubernetes flavours include Google Kubernetes Engine (GKE), Microsoft Azure Kubernetes Service, Amazon Elastic Kubernetes Engine, Red Hat Openshift, VMWare Tanzu, and Rancher, amongst others.
Apache Mesos uses the Marathon orchestrator and was designed to scale to tens of thousands of physical machines. Mesos is in production with some large enterprises such as Twitter, Airbnb, and Netflix. An application running on top of Mesos comprises one or more containers and is referred to as a framework. Mesos offers resources to each framework, and each framework must then decide which to accept. Mesos is less feature-rich than Kubernetes and may involve extra integration work.
Kubernetes (k8s) is an open-source multi-cloud container orchestration platform developed by Google.
Of all the container orchestration tools available in the market, Kubernetes is most popular because Kuberenetes:
Takes care of scaling up and scaling out of containers.
Can handle load balancing and auto scaling.
Allows clustering of any number of containers running on a different network or even different hardware. For example, a Kubernetes cluster can span virtual and physical machines without requiring changes to the applications being run.
Supports communication between containers.
Can create new server instances and create containers on those newly created instances too.
Ensures continuous health monitoring of containers.
Can roll-back to any older version of an app.
Kubernetes consists of clusters, where each cluster has a control plane (one or more machines managing the orchestration services), and one or more worker nodes. Each node is able to run pods, with a pod being a collection of one or more containers run together. The Kubernetes control manages the nodes and the pods, and not the containers directly. The pods manage the container lifecycle.
Components of Kubernetes
Let’s talk about the key components making up Kubernetes:
Kubernetes Control Plane is a set of services in every cluster that manage the overall state of the cluster and all services running within it.
A cluster is a collection of one or more bare-metal servers or virtual machines (referred to as nodes) providing the resources used by Kubernetes to run one or more applications.
etcd database stores the cluster configuration, pod configuration, authentication and metadata.
Scheduler creates or moves the pods onto nodes based on the resource requirements and availability.
Controllers take care of cluster-level tasks like replication, node management, and endpoints.
Kubelet is the main agent on each node that interacts with the Kubernetes Control Plane and ensures that all the pods scheduled to that noes are healthy, up, and running.
Kube-proxy is the default DNS manager of Kubernetes and routes incoming requests, assigns IP addresses to the pods, and connects pods and services.
A Pod is a group of one or more containers and volumes co-located on the same host. Containers in the same pod share the same network namespace and can communicate with each other using localhost.
Services act as a network interface to pods and allow communication from inside or outside the cluster.
Ingress is an optional component that allows more granular control over traffic to the pods than services alone offer — for example, directing traffic to the right web server based on the URL.
Kubernetes is able to support almost any type of application, as long as the correct configuration is used to ensure that the applications needs are met. This includes, but is not limited to, highly stateful applications like databases and stateless deployments.
A container orchestration tool like Kubernetes manages the container lifecycle based on configuration files (.yaml) declared by the project admin or developer. The Kubernetes control plane communicates the same to the pods. If a pod is deleted or restarted, the data is lost and a new pod is deployed with a clean state. This is because by default containers are ephemeral or stateless.
However, in some applications — for example, a stateful application like MongoDB — data needs to be persisted, and pods should be created or restarted with the same identity (sticky identity). For this, container orchestration tools use StatefulSets, a workload API to manage stateful applications. Kubernetes does this using Kubernetes Volumes, the configurations of which can be defined in the manifest.
MongoDB fully supports use of Kubernetes by providing three operators — extensions to the Kubernetes control plane that make using specific applications much easier.
MongoDB Community Operator — used to run MongoDB Community Edition in a Kubernetes cluster
MongoDB Enterprise Operator — used to run MongoDB Enterprise Advances in a Kubernetes cluster, including running Ops Manager
MongoDB Atlas Operator — used to make it much easier to manage Atlas alongside your existing application stack in Kubernetes
MongoDB is a NoSQL database, popularly used for data-driven applications. MongoDB Atlas is a fully-managed data platform provided by MongoDB that allows you to manage and access your database in the cloud. MongoDB Atlas provides self service apps and tools to create scalable, highly available applications. Atlas supports the three big cloud providers: Amazon AWS, Microsoft Azure and Google Cloud.
The MongoDB Atlas Kubernetes Operator works a little differently to MongoDB's other Kubernetes Operators. Where they run MongoDB within the Kubernetes cluster, the Atlas Operator enables you to manage your deployments in MongoDB Atlas (running on one of the supported public clouds) using configuration in Kubernetes that can be managed in the same way as you manage configuration for applications running in your Kubernetes environment.
To manage Atlas infrastructure via Kubernetes, MongoDB provides custom resources, like AtlasDeployment, AtlasProject, AtlasDatabaseUser, and many more. A custom resource is a new Kuberentes object type provided with the Atlas Operator, and each of these custom resources represent and allow management of the corresponding object types in Atlas. For example, creating and deploying to Kubernetes an AtlasDeployment resource will cause the Atlas Operator to create a new deployment in Atlas. The Operator manages Atlas using the Atlas Admin APIs.
Atlas offers a developer data platform that is not only incredibly powerful, but also comes at a much lower total cost of ownership (TCO), thanks to the high degree of intelligent automation built into the platform. In this way, it's a fantastic compliment to Kubernetes-based applications in that both together can enable developers to focus on iterating on their applications rather than spending all their time manually managing infrastructure.
The Atlas Kubernetes Operator adds to this by providing the means to manage Atlas in a way that is becoming more and more common for other infrastructure — as IaC (infrastructure as code), via declarative configuration files. With IaC, the infrastructure provisioning and management can be automated, saving time that is otherwise consumed in manually scaling, restarting, and spinning the infrastructure from time to time. The admin or developer only configures the desired state and not how to reach it. This also simplifies and speeds up the CI/CD pipeline and promotes the DevOps culture through continuous integration, testing, and deployment of code changes. The infrastructure changes can be automatically tracked, audited, and managed like code, using version control tools like Git.
This abstracts the developer from much of the complexity of infrastructure. In a simple and consistent format, the developer can define the desired state of both their application in Kubernetes and, thanks to the Atlas Kubernetes Operator, Atlas within their code repository, and using automation have it applied to Kubernetes, where the Atlas Operator will enact the required changes to Atlas.
The Atlas Operator even helps prevent configuration drift thanks to its automated periodic reconciliation between the configuration applied in Kubernetes and Atlas, ensuring the small manual changes via another interface don't eventually undermine your applications.
MongoDB supports all the above using the Atlas Kubernetes operator, powering an Internal Developer Platform (IDP), consisting of IaC, CI/CD, and DevOps, to manage containers via Kubernetes and enabling self-service capabilities, for automation.
How MongoDB Atlas and the Atlas Kubernetes Operator benefit applications deployed on Kubernetes