The Apress Certification Study Companion Series offers guidance and hands-on practice to support technical and business professionals who are studying for an exam in the pursuit of an industry certification. Professionals worldwide seek to achieve certifications in order to advance in a career role, reinforce knowledge in a specific discipline, or to apply for or change jobs. This series focuses on the most widely taken certification exams in a given field. It is designed to be user friendly, tracking to topics as they appear in a given exam and work alongside other certification material as professionals prepare for their exam.
More information about this series at https://link.springer.com/bookseries/17100.

This Apress imprint is published by the registered company APress Media, LLC, part of Springer Nature.
The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.
To my pillars of strength—my parents, whose unwavering support and sacrifices have shaped my journey. Your belief in me has been my guiding light through every challenge and triumph.
To Himani, my better half and strongest advocate. Your endless encouragement and steadfast belief in my capabilities have given me the courage to chase my dreams. You make everything possible.
To my brother Lakshya, for being my constant companion and unconditional support in every phase of life.
To my precious bundles of joy—Kashvi and Krishiv. You are my lucky charms who have transformed our lives with your innocent smiles and boundless love. This book is a testament to the inspiration you bring to my life every single day.
This book is a celebration of your love, support, and the beautiful moments we share together. Thank you for being my reason to strive harder and dream bigger.
With love and gratitude,
Piyush Sachdeva
The CKA was created by the Linux Foundation and the Cloud Native Computing Foundation (CNCF) as a part of their ongoing effort to help develop the Kubernetes ecosystem. The exam is an online, proctored, performance-based test that requires solving multiple tasks from a command line running Kubernetes.
Once enrolled you will receive access to an exam simulator, provided by Killer.sh, allowing you to experience the exam environment. You will have two simulation attempts (36 hours of access for each attempt from the start of activation). The simulation includes 20–25 questions that are exactly the same for every attempt and every user, unlike the actual exam. The simulation will provide graded results.
—As per the Linux Foundation website
The Certified Kubernetes Administrator (CKA) exam tests your ability to deploy and manage production-grade Kubernetes clusters. In this book, I will deep-dive into each topic required for the exam and what a Kubernetes Administrator is expected to know. This book is divided into 6 parts and 23 chapters, each focusing on a particular area of Kubernetes, such as fundamentals and core concepts, workloads and scheduling, storage, installation, upgrades, maintenance, etc.
The exam is not Multiple Choice Question (MCQ) based; rather, it is a complete hands-on one where you will be provided with a sandbox lab environment based on a Linux Ubuntu image, and you will be asked to perform certain tasks based on your learning. These could include simple tasks such as writing a kubectl command, intermediate tasks such as creating an Ingress based on the requirements, and advanced tasks such as Kubernetes cluster upgrade and installation. This book has been curated keeping in mind all the tasks that are important for the exam and from a Kubernetes Administrator point of view.
You will also notice that we will be doing two types of Kubernetes installation: one is KinD and the other is using Kubeadm. Both serve a specific purpose. We will be using KinD in the beginning to get you started without knowing much about Kubernetes’ internal workings and troubleshooting aspects; however, in the latter part of the book, we will use a Kubeadm setup which will prepare you for more advanced topics. Instead of KinD, you can also use Minikube,1 but KinD is recommended, as Minikube has some limitations and is heavy on your system as compared to KinD which is a lightweight Kubernetes distribution.
Kubernetes is one of the most popular container orchestration tools that enable businesses to build scalable and resilient applications. However, before diving into Kubernetes, it’s essential to have a strong foundation in key technical areas.
Container Technology Proficiency: Master Docker/Podman, container networking, and troubleshooting.
Linux Administration Skills: Gain expertise in Linux, including systemd, process management, logging, network storage, etc.
Networking Concepts: Understand IP addressing, DNS, load balancing, and security protocols like SSL/TLS.
YAML and JSON Configuration: Learn the essential syntax for defining Kubernetes manifests.
Security Fundamentals: Familiarize yourself with authentication, PKI, RBAC, and service accounts.
Skill in Deploying Containers: Develop the ability to deploy and troubleshoot containerized applications.

Prerequisites to learning Kubernetes
The CKA exam is online and proctored, so there are some requirements when it comes to supported OS, supported browser, system requirements, and so on. Make sure you go through the requirements outlined in the candidate handbook or the following link:
https://docs.linuxfoundation.org/tc-docs/certification/lf-handbook2/candidate-requirements
The minimum passing threshold is 66%. Questions carry different weights, based on their complexity and completion time.
When ready, register for the exam. All the details are available in the Linux Foundation exam handbook:
https://docs.linuxfoundation.org/tc-docs/certification/lf-handbook2
You will have two hours to complete between 15 and 20 practical tasks. Each task tests your ability to solve real Kubernetes administration challenges, so time management is crucial. Appendix A provides some tips on time management.

Exam weightage per topic
In February 2025, the Linux Foundation implemented some changes in the exam curriculum to better align it with the fast-evolving cloud and DevOps technologies. Although I have covered everything in this book as per the new curriculum, it is important to understand what has changed.
The following topics have been added.
Implement StorageClasses and dynamic volume provisioning
Configure workload autoscaling (VPA, HPA)
Configure pod admission and scheduling (limits, node affinity, etc.)
Define and enforce network policies
Use the Gateway API to manage Ingress traffic
Implement and configure a highly available control plane
Use Helm and Kustomize to install cluster components
Understand extension interfaces (CNI, CSI, CRI, etc.)
Understand CRDs and install and configure operators
ETCD backup and restore
Prepare underline infrastructure for the Kubernetes cluster
Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub. For more detailed information, please visit https://www.apress.com/gp/services/source-code.
Writing Certified Kubernetes Administrator Study Companion has been a journey of deep learning, discipline, and focus. I would like to express my heartfelt gratitude to two mentors who, through their books, played a pivotal role in shaping my approach to productivity and consistency—Ali Abdaal and James Clear.
Ali Abdaal’s Feel-Good Productivity taught me how to structure my work in a way that felt both enjoyable and sustainable, while James Clear’s Atomic Habits helped me build systems that kept me on track, focusing on what truly matters. Their insights have been instrumental in helping me balance content creation, technical writing, and continuous learning.
Without the direction and mindset shifts I gained from their work, this book might have remained just an idea. Thank you for the wisdom and frameworks that made this possible.

He founded “The CloudOps Community,” where students and working professionals learn and build cool stuff online. He also hosts monthly challenges that focus on learning and real-time hands-on tasks such as #40daysofkubernetes, #10weeksofcloudops, and more.
He also runs a YouTube channel, TechTutorialsWithPiyush, where he teaches everything from cloud platforms (AWS, Azure, GCP) to DevOps methodologies. His goal is to make complex topics accessible and help others grow in their tech journey.

With over eight years of experience sharing real-world insights on topics like DevOps, cloud, and software engineering, Shubham has become a trusted mentor on YouTube and LinkedIn for his initiative of Train With Shubham.
He is passionate regarding making tech accessible for everyone, from beginners to advanced professionals, drawing on his background in both start-ups and midsize companies.
Chapters 1–3 provide a solid introduction to Kubernetes, covering its purpose, core concepts, and architecture. This part begins with an overview of what Kubernetes is and why it’s essential for managing containerized applications at scale. We will then look into the Kubernetes architecture, explaining the roles of control plane and worker nodes along with key components such as the API Server, etcd, Scheduler, Controller Managers, Kubelet, and container runtime. Finally, we will explore setting up both single-node and multi-node Kubernetes clusters using KinD, giving readers hands-on experience to reinforce their understanding.
As microservices become more adaptable, we are rapidly moving toward containerization. However, merely running containers is insufficient for rapid development and seamless integration. In this chapter, we will explore the challenges of managing containerized applications at scale and introduce Kubernetes as a solution.
Consider a simple scenario: you have a small application comprising several containers running on a virtual machine. When everything works smoothly, your development and operations teams are satisfied. However, this setup faces several critical challenges.
24/7 coverage requirements across global time zones
Increased operational costs for maintaining support teams
Response time delays during off-hours
Limited ability to handle multiple simultaneous failures
Managing hundreds or thousands of containers becomes unsustainable manually.
Multiple concurrent container failures require rapid and coordinated responses.
Virtual machine failures (underline host) can bring down entire applications.
Version/application upgrades across numerous containers become logistically complex.
Service Discovery and Exposure: Determining how to expose applications to users and manage routing
Load Balancing: Distributing traffic effectively across container instances
Resource Management: Allocating and optimizing compute resources
High Availability: Ensuring continuous service despite failures
Security: Managing access controls and network policies
Networking: Handling inter-container communication
Monitoring and Logging: Tracking system health and debugging issues
Container scheduling and placement
Service discovery and load balancing
Self-healing through automatic restarts and replacements
Horizontal scaling
Rolling updates and rollbacks
Resource management and optimization
Managing just a few containers
Running simple applications with minimal scaling needs
Operating with limited DevOps expertise
Working with tight infrastructure budgets
Docker Compose
Single-host container deployments
Virtual private servers
Managed application platforms
Now that we have learned about the need for Kubernetes and the problems it solves, it’s time to understand the Kubernetes architecture and its components.
Kubernetes architecture follows established distributed systems principles to provide a robust container orchestration platform. In this chapter, we will look into the core architectural components that enable Kubernetes to manage containerized applications at scale.

Kubernetes architecture diagram
Kubernetes implements a master-worker (control plane and data plane) architecture that separates cluster management functions from application workload execution.
Each of these components is deployed on separate nodes to ensure reliable cluster operations while maintaining scalability and fault tolerance.
You might be wondering what exactly a node is in Kubernetes.
A Kubernetes node is nothing but a physical machine or a virtual machine with Kubernetes components installed on top of it and connected to form a Kubernetes cluster.
Control Plane/Master Nodes: Manages the cluster state and makes global decisions
Data Plane/Worker Nodes: Executes application workloads and implements networking policies
The master node (a.k.a. control plane) is the brain of the Kubernetes cluster. It manages the cluster's overall state, ensuring that the desired state (as defined by the user) matches the actual state of the cluster. The control plane components take care of all the operational and administrative tasks.
Worker nodes are the machine(s) where the actual application workloads (pods) run. They are responsible for running the containers and ensuring that the application remains highly available, fault tolerant, and highly scalable.
API Server: The primary management endpoint that accepts and processes all REST requests
ETCD: A distributed key-value store that maintains cluster state
Kube-Scheduler: Assigns workloads to worker nodes based on resource requirements
Controller Manager: Manages various controllers that handle routine cluster operations
Cloud Control Manager: Allows your API Server to interact with your cloud provider to manage and provision cloud resources
Each of these components serves a specific purpose in maintaining the desired cluster state; let’s discuss the control plane components in more detail.
The Kubernetes API Server is a component of the Kubernetes control plane that exposes the Kubernetes API, which is used by all other components of Kubernetes and client applications (such as Kubectl, the CLI tool) to interact with the cluster. It acts as the frontend for the Kubernetes control plane, and the client interacts with the cluster using API Server. It is responsible for validating and processing API requests, maintaining the desired state of the cluster, and handling Kubernetes resources such as pods, services, replication controllers, and others.
API Server is the only control plane component that interacts with all other control plane components such as Scheduler, ETCD, Controller Manager, etc.

Control plane components
The Scheduler in Kubernetes is a component responsible for scheduling workloads (such as pods) to the nodes in the cluster. It watches for newly created pods with no assigned node, selects an appropriate node for each pod, assigns a rank to the node, and then binds the pod to that node.
The scheduler considers factors such as resource requirements, hardware/software constraints, affinity and anti-affinity specifications, labels and selectors, and other policies defined by the user or cluster administrator when making scheduling decisions.
These scheduling factors and variables will be discussed later in this book.
The Controller Manager in Kubernetes is a component of the control plane that manages different types of controllers to regulate the state of the cluster and perform cluster-wide tasks. Each controller in the Controller Manager manages a specific aspect of the cluster's desired state, such as ReplicaSetController, DeploymentController, NamespaceController, NodeControllers, and others.
These controllers continuously work to ensure that the current state of the cluster matches the desired state specified by users or applications. They monitor the cluster state through the Kubernetes API Server, detect any differences between the current and desired states, and take corrective actions to reconcile them, such as creating or deleting resources as needed.
For instance, if a node becomes NotReady (Unhealthy), the NodeController takes the action of node repair and replaces it with a healthy node (if required).
ETCD is a distributed key-value storage used as the primary datastore in Kubernetes for storing cluster state and configuration information. It is a critical component of the Kubernetes control plane that is responsible for storing information such as cluster configuration, the state of all Kubernetes objects (such as pods, services, and replication controllers), and information about nodes in the cluster. ETCD ensures consistency and reliability by using a distributed consensus algorithm to replicate data across multiple nodes in the ETCD cluster.
The Cloud Controller Manager abstracts the cloud-specific details from the core Kubernetes components, allowing Kubernetes to be used across different cloud providers without requiring changes to the core Kubernetes codebase. In this book, we are not discussing the managed cloud services; hence, Cloud Controller Manager will not be discussed much.
Let’s strengthen our understanding of the worker nodes by examining the essential components that enable them to run your applications. In this section, we will look into the critical roles of the Kubelet, Kube-proxy, and the Container Runtime, the backbone of pod execution and network communication on each node.
In Kubernetes, Kubelet is the primary node agent that runs on each node in the cluster. It is responsible for managing the containers running on the node and ensuring that they are healthy and running as expected.
Pod Lifecycle Management: Kubelet is responsible for starting, stopping, and maintaining containers within a pod as directed by the Kubernetes API Server.
Node Monitoring: Kubelet monitors the health of the node and reports back to the Kubernetes control plane. If the node becomes unhealthy, the control plane can take corrective actions, such as rescheduling pods to other healthy nodes.
Resource Management: Kubelet manages the node's resources (CPU, memory, disk, etc.) and enforces resource limits and requests specified in pod configurations.
Volume Management: Kubelet manages pod volumes, including mounting and unmounting volumes as specified in the pod configuration.
Overall, Kubelet plays a crucial role in ensuring that pods are running correctly on each node in the Kubernetes cluster and that the cluster remains healthy and operational. While the Scheduler assigns pods to the node, the actual work of container execution is taken care of by the Kubelet.
In Kubernetes, kube-proxy is a network proxy that runs on each node in the cluster. It is responsible for implementing part of the Kubernetes service concept, which enables network communication to your pods from network clients inside or outside of your cluster.

kube-proxy running on a worker node
kube-proxy maintains network rules on each node. These network rules allow network communication to be forwarded to the appropriate pod based on IP address and port number.
In Kubernetes, a container runtime is the software responsible for running containers. It is an essential component of the Kubernetes architecture because Kubernetes itself does not run containers directly; instead, it relies on a container runtime to do so.
Pulling container images from a container registry (e.g., Docker Hub, Artifact Registry)
Creating and managing container lifecycle (start, stop, pause, delete)
Managing container networking and storage
Providing container isolation and resource constraints
Docker was the default container runtime for Kubernetes before Kubernetes version 1.24; however, the default container runtime has been changed to Containerd after 1.24 which is a CRI (container runtime interface) industry standard that provides a lightweight and reliable platform for managing containers.
Kubernetes is a modern container orchestration platform that helps run your containerized workload at scale.
Kubernetes implements a master-worker–based architecture in which master or control plane nodes are responsible for administrative and operational tasks and worker nodes are responsible for running customer workloads.
API Server: The primary process that processes all incoming requests and communicates with other components
ETCD: A distributed key-value store that maintains cluster state
Kube-Scheduler: Assigns workloads to worker nodes based on resource requirements
Controller Manager: Manages various controllers that handle routine cluster operations
Kubelet: The node agent that runs on each node and makes sure that containers are running in a pod.
kube-proxy: Maintains network rules that enable communication to your pods from inside or outside the cluster.
Container Runtime: Maintains the execution and lifecycle of a container. Containerd is the default container runtime after Kubernetes version 1.24; earlier, it was Docker.
In the previous chapter, we have learned about the Kubernetes architecture and its components; now it’s time to see those concepts in action by performing the Kubernetes installation.
A Kubernetes cluster including its control plane components can be deployed in several ways, and the installation type depends on your use case, budget, and requirements.

Kubernetes cluster installation options
If you are doing a Proof of Concept (POC) or using the cluster for learning purposes or testing something, then you can use either Minikube, KinD, or K3S. You cannot do a lot of things in Minikube as it does not support network policies; hence, we will be starting with KinD, which is Kubernetes inside Docker.
In your local machine, with Docker installed, it will provision multiple containers for you, and each of these containers will act as a separate node of the cluster, hence the name KinD. In the first half of this book, we will be working with the KinD cluster; however, once you are familiar with Kubernetes, we will be running the cluster components as static pods using Kubeadm so that we can work on advanced Kubernetes concepts such as Kubernetes installation, upgrade, network policies, and so on.
In this chapter, we’ll be performing both Kubernetes single-node and multi-node installation using KinD.
Make sure you have Go 1.16+ and Docker installed and running on your machine to run and use KinD.
To create a cluster using KinD, you need to first install KinD. There are several options, such as installing from release binaries, installation from source, or using a package manager. You can follow the below document for up-to-date instructions:
https://kind.sigs.k8s.io/docs/user/quick-start

KinD cluster create output
To interact with the cluster, you should already have Kubectl utility installed on your machine; if you do not have it, now is the good time to install it by going to the following link:
https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/
kubectl is a command-line tool that helps communicate with the Kubernetes control plane using the Kubernetes API. It uses a file called kubeconfig located in the $HOME/.kube directory by default, and you can override the file by setting the KUBECONFIG environment variable or by setting the –kubeconfig flag in the kubectl commands.

Verify cluster health
Remember, these are nothing but the containers running on your machine.
A Kubernetes cluster can be deployed in several ways such as KinD, Minikube, K3S, self-managed, VMs on Cloud, systemd processes, static pods, or even the managed services by cloud providers, such as AKS, EKS, GKE, etc.
We can quickly set up a Kubernetes cluster locally using KinD which creates multiple containers, and each container acts as a Kubernetes node that forms a cluster.
Once the cluster is created, you can interact by switching the context to the cluster name. The kubectl utility is a prerequisite to be installed on the machine to interact with the cluster.
You can run the kubectl get nodes command to verify that your cluster installation is successful.
Understanding deployments and how to perform rolling update and rollbacks
Using configmaps and Secrets to configure applications
Configuring workload autoscaling
Configuring pod admission and scheduling (limits, node affinity, etc.)
Knowing how to scale applications
Understanding the primitives used to create robust, self-healing, application deployments
Understanding how resource limits can affect pod scheduling
Awareness of manifest management and common templating tools
So far, we have done the cluster installation and understood the Kubernetes fundamentals. From this chapter onward, we will be doing the hands-on with actual Kubernetes resources starting with pods.
Pods are the smallest deployable units that you can create and manage in Kubernetes. You run your applications on these Kubernetes pods.

Pod running in the cluster
A pod is a group of one or more containers, with shared storage and network resources, and a specification for how to run the containers. Ideally, you should have one container per pod; however, there are some edge cases (we will discuss those later in this chapter) in which you need to use multiple containers inside a pod.
Imperative Way: Through Kubectl commands or API calls
Declarative Way: By creating manifest files (usually in YAML or JSON format) with the desired state

Create a pod using kubectl
Creating a Kubernetes object such as a pod through the manifest file can be done using a YAML or JSON file. (YAML is the preferred way.)
Most of the manifest YAMLs would have four mandatory top-level fields such as apiVersion, kind, metadata, and spec (as highlighted above).
Once the file is created with .yaml or .yml extension, you can simply run the kubectl create -f filename command to apply the changes. To make changes in the pod specification such as label, container, etc., you need to update the file and apply it again using kubectl apply -f filename.
The apply parameter in the command can be used to create a new object or to make changes to the existing object; however, create can only be used to create new objects. Most of the Kubernetes admins use only the apply command instead of create.

Create a pod using the declarative way
Pending: The pod is accepted by the cluster but is waiting for scheduling and not yet running.
Running: The pod is assigned to a node, and at least one container is running or starting.
Succeeded: All containers have completed successfully.
Failed: One or more containers have failed, and the pod will not restart.
Unknown: The pod state cannot be determined due to a node communication failure.
Waiting: The container is preparing to start, pulling images, or processing secrets.
Running: The container is actively executing without errors.
Terminated: The container has either successfully exited or failed, with logs indicating the reason.
To make changes to a running pod, you can edit the manifest file and apply the changes through the kubectl apply command, or you can also directly run the kubectl edit pod <podname> command to edit the live object without making any changes to the manifest file.
To create a YAML with the pre-populated fields that can be used to create a new pod, you can use the below commands to first dry-run the kubectl command and then redirect the output to a sample yaml file:

Dry-run command to get the YAML
Init Containers: Run and complete before the application containers are started, for instance, to perform some pre-validation/sanity checks or operations
Sidecar Containers: Provide some helper service to the main application container, for example, service mesh, monitoring agent, logging agent, etc.

Nginx container running with a sidecar container
We have already learned a brief introduction about multi-container pods; let us see how we can create such pods. For instance, we have to create a multi-container pod with nginx as the main application container and an init container that checks for service availability and completes when the service is up and running. We will be learning about services in Chapter 6, but, for now, let’s just focus on the multi-container pod.
When you apply the yaml, the init container will be executed first and look for the service after every two seconds; as soon as the service is created and accessible, it will be completed, and the app container will be started. During this time, the pod status will be init[0/1] as the init container is still waiting to be finished.
Prior to Kubernetes 1.28, sidecars were essentially just regular containers that ran alongside the main application. With Kubernetes 1.28 (and default in 1.29+), a more formal sidecar concept emerged, which involves defining them within initContainers but with a restartPolicy: Always. This allows them to start before the main app but continue running and also allows them to use probes.
Pods are the smallest deployable units in Kubernetes, designed to run one or more containers with shared storage, network, and runtime specifications. While a single container per pod is common, multi-container pods can also be used for some use cases.
You can create a pod using an imperative approach or a declarative approach.
Imperative Approach: Use kubectl commands to create pods.
Declarative Approach: Use YAML/JSON manifest files to define and manage pods. Apply changes with the command kubectl apply.
Multi-container pods can be used for scenarios like logging, monitoring, or pre-start checks.
Init Containers: Perform setup tasks before the main container starts.
Sidecar Containers: Provide auxiliary services like logging agents.
Environment variables are defined in the env section of pod specs as key-value pairs and accessed inside containers using $<variable_name>.
In this chapter, we will look into the core Kubernetes controllers responsible for maintaining application availability and scalability: Replication Controllers, ReplicaSets, and deployments. We will learn how these resources ensure that the desired number of pod replicas is always running and how deployments provide a higher-level abstraction for managing updates and rollbacks.
We have learned about pods (the smallest deployable unit) in Kubernetes, which do not guarantee high availability and fault tolerance, as there is only a single copy of the container running all the time, and it is not backed by a controller that ensures that the pod is auto-healed upon failure.
To overcome the issue, the Replication Controller was created, which ensures that a specified number of pod replicas are up and running all the time. If there is a pod failure, the controller replaces the failed pod with a healthy pod to maintain the desired number of running replicas.
Replication Controllers are legacy controllers and are replaced by ReplicaSets managed by deployments, so we will focus more on ReplicaSets and deployments.
A ReplicaSet ensures that a given number of pod replicas are running all the time, ensuring high availability of the application. A ReplicaSet is the newer version of the Replication Controller and is mostly used along with the deployment, as it provides some additional features such as rolling updates. Usually, you define a deployment that manages a ReplicaSet automatically, and you do not interact with the ReplicaSet manually.

Kubernetes ReplicaSet
A deployment provides declarative updates for pods and ReplicaSets. You describe a desired state in a deployment, and the deployment controller changes the actual state to match the desired state at a controlled rate.
You specify the pods inside the deployment using a template, and the ReplicaSet will be created automatically by the ReplicaSet Controller (a component of Kube-Controller Manager), which manages those pods. You also define the number of replicas inside the YAML that ensures a certain number of pods running all the time.

Kubernetes deployment diagram
While working with pods, we used v1 as the apiVersion, but that is not the same for each of the Kubernetes objects. For deployment, we use the version as apps/v1; this field can be verified using the command kubectl explain deployment.
Replicas defines the number of pod replicas, that is, the desired number of application instances that should be running all the time.
Selector defines the pod that the deployment manages based on the labels. The matchLabels suggests to control the pod that has a matching label, env: demo in this case.
Let’s assume we have created the deployment with three replicas as stated above. It will create three pods and one deployment along with a ReplicaSet. You can check the status using the command kubectl get pods (po for short) or kubectl get deployment (deploy for short).

Validate all the running resources in the cluster

Update container image through kubectl

Deployment revision history

Undo the latest changes to a deployment
Whenever you make any changes, the replicas are updated in a rolling update fashion by default, meaning it updates one replica at a time and keeps the other replicas running while changes are being performed on one replica. Kubernetes also supports another deployment strategy called recreate; as the name suggests, it replaces all the pods at once with newer pods, which could introduce some disruption to the application's availability.
These commands are hard to remember especially for a beginner; the below quick-reference guide will come in handy for frequently used kubectl commands.
The best part is that the guide along with the Kubernetes official documentation is accessible during the exam.
https://kubernetes.io/docs/reference/kubectl/quick-reference/https://kubernetes.io/docs is available during the exam, including its subdomains such as https://kubernetes.io/docs/reference/kubectl/quick-reference/.
Replication Controller (Legacy): A legacy controller that maintains pod replicas and provides auto-healing capabilities; replaced by ReplicaSet but still supported in Kubernetes.
ReplicaSet: A modern replacement for the Replication Controller that maintains desired pod replicas, provides high availability, and integrates with deployments.
Deployment: A declarative controller that manages ReplicaSets automatically, handles pod updates/rollbacks, and ensures that the desired number of pod replicas is always up and running.
Useful Commands
Deployments are the recommended way to manage ReplicaSets and pods, using RollingUpdate by default, with matching labels/selectors required.
We create pods where we run our workloads for a frontend application. We create a deployment to make sure the pods are highly available. To ensure application pods are accessible to the outside world or to client applications, we need to expose the application as a service on an endpoint even when the workload is split across multiple backends. The service then acts as a load balancer, receives the incoming requests, and redirects them to the backend pods. In this chapter, we’ll review the following four types of Kubernetes services:
ClusterIP (for internal access)
NodePort (to access the application on a particular port)
Load Balancer (to access the application on a domain name or IP address without using the port number)
External (to use an external DNS for routing)
ClusterIP is the default service type in Kubernetes that makes your application accessible only within the Kubernetes cluster using an internal IP address. Other services within the cluster can use this IP to communicate with the service, for example, if you have a multi-tier application deployed as a pod with a frontend deployment, backend, and database. You can use ClusterIP to keep communication between multiple tiers without exposing the service externally.

Sample three-tier application that uses ClusterIP
NodePort exposes the service on a specific port on each node in the cluster that allows external traffic to access the service by sending requests to <NodeIP>:<NodePort> where the NodePort is typically between 30000 and 32767.
NodePort: The port of the service that is exposed externally
Internal Service Port: The port on which the service is exposed internally
TargetPort: The port on which your application is running

NodePort service in Kubernetes
If you are using a KinD cluster, we need to perform an extra step. If you remember, the nodes in the KinD are the containers running on your local machine, so we need to expose the nodes (containers) to use the service. You can delete the existing KinD cluster and create a new one using the below sample config yaml with extraPortMappings (for port forwarding), as you cannot update the existing KinD cluster.

Running services in the Kubernetes cluster
Additionally, the command kubectl describe svc nodeport-svc shows the additional details such as IP addresses, endpoints, port details, labels, selectors, etc.

Inspect a Kubernetes service
To test if your service is working fine, you can run a curl localhost:30001, which should redirect to the pod (running nginx) that was exposed with the service on port 30001 and return the default nginx home page.
As the name suggests, this service will provision an external-facing load balancer with the help of Cloud Controller Manager (CCM) for your application and expose it via a public IP. Your load balancer service will act as NodePort if you are not using any managed cloud Kubernetes, such as GKE, AKS, EKS, etc. In a managed cloud environment, Kubernetes creates a load balancer within the cloud project, which redirects the traffic to the Kubernetes Load Balancer service.

Load Balancer service type in Kubernetes
If you’re deploying a production application that needs to be accessible to the outside world, like a web application or an API, you would use a Load Balancer service. For example, a public-facing ecommerce website running in Kubernetes could be exposed using a Load Balancer service.
Kubernetes services use label selectors to automatically discover and route traffic to the correct pods as the backend. When a service is created, it defines a selector that matches specific pod labels, ensuring that only those pods receive traffic.
it will target all pods with the label env=demo as the backend to this service. This mapping ensures load balancing and fault tolerance, as pods can be added or removed without modifying the service.
In Kubernetes, services enable applications running in pods to be accessible, ensuring seamless communication within the cluster or with external clients. Services act as load balancers, directing incoming requests to the appropriate backend pods.
ClusterIP (Default): Provides internal-only access within the Kubernetes cluster and is commonly used for multi-tier applications to enable communication between frontend, backend, and database tiers without external exposure.
NodePort: Exposes the service on a specific port (30000-32767) of each node, allowing external traffic via <NodeIP>:<NodePort>.
Load Balancer: Used for production applications to provision a cloud-managed external load balancer with a public IP and suitable for web applications and APIs needing external access. Please note that the Load Balancer service would default to NodePort behavior in environments without cloud-managed Kubernetes.
ExternalName: Maps a service to an external DNS name without assigning an internal IP and is ideal for connecting to external services outside the cluster.
We have learned about different Kubernetes resources such as pods, deployments, services, etc., and how to manage them, but we have not discussed how to logically group them together for better management or isolate them (if needed).
Namespace-scoped objects, such as pods, deployments, services, etc.
Cluster-scoped objects, such as StorageClasses, nodes, PersistentVolumes, etc.
When you create an object in Kubernetes, you can specify the namespace in which the object should be created; by default, the resources are created in the default namespace.
Default: All the resources you create get created in this default namespace.
Kube-node-lease: This namespace holds the lease object of each node that helps Kubelet to send heartbeats to the API Server to detect any node failure.
Kube-public: This namespace is readable publicly without any authentication.
kube-system: This namespace holds the Kubernetes-managed objects, including the control plane components.
You can use either the keyword namespace or ns.
You do not have to explicitly execute these commands on the exam sandbox as they will already be done for you.
Now, let’s go back a few chapters, when we created our namespace-scoped resources such as pods, deployments, services, etc. We did not specify the namespace; hence, they were created in the default namespace.
What if we want to specify the namespace to be used?
We just have to pass the --namespace mynamespace parameter in the apply command or add the namespace field in the spec section of the resource. You can either use --namespace namespacename or -n namespacename in the kubectl command.

Pass the namespace flag in the kubectl command
To access the resources from a particular namespace, you provide the --namespace or -n parameter along with the kubectl get command.
We are using k instead of kubectl as we already set the alias.

Get deployment details for demo namespace
An important point to remember is that resources inside a namespace can communicate with each other with their hostname (service name); however, the resources from different namespaces do not have access to each other with their hostname but can only be accessed using an FQDN (Fully Qualified Domain Name).

How services access each other across namespaces
The above entry can also be obtained from the /etc/resolv.conf file on the pod.
Create two namespaces and name them ns1 and ns2.
Create a deployment with a single replica in each of these namespaces with the image as nginx and the names as deploy-ns1 and deploy-ns2, respectively.
Get the IP address of each of the pods (remember the kubectl command for that?).
Exec into the pod of deploy-ns1 and try to curl the IP address of the pod running on deploy-ns2.
Your pod-to-pod connection should work, and you should be able to get a successful response back.
Now scale both of your deployments from one to three replicas.
Create two services to expose both of your deployments and name them svc-ns1 and svc-ns2.
Exec into each pod and try to curl the IP address of the service running on the other namespace.
This curl should work.
Now try doing the same using the service name instead of the IP. You will notice that you are getting an error saying “cannot resolve the host.”
Now use the FQDN of the service and try to curl again; this should work.
In the end, delete both the namespaces, which should delete the services and deployments underneath them.
Namespaces in Kubernetes provide isolation for resources within a cluster, enabling effective resource management and multi-tenancy.
Namespace-Scoped Resources: Pods, deployments, services, etc.
Cluster-Wide Resources: Nodes, PersistentVolumes, StorageClasses, etc.
By default, resources are created in the default namespace unless specified otherwise. A new Kubernetes cluster initializes with four predefined namespaces:
Default: Default namespace for user-created resources
Kube-node-lease: Contains node lease objects for Kubelet heartbeats
Kube-public: Publicly readable namespace without authentication
Kube-system: Stores Kubernetes-managed objects and control plane components
Within a Namespace: Resources can communicate using their hostname (e.g., service name).
Across Namespaces: Resources require a Fully Qualified Domain Name (FQDN) in the format
By organizing resources with namespaces, Kubernetes ensures better isolation, resource management, and scalability.
Now that we know about the fundamental Kubernetes resources and how to manage them, it’s time we look into more advanced resources that can be deployed for various purposes. Resources such as DaemonSet, CronJob, and jobs will be covered in this chapter.
DaemonSet is a Kubernetes resource that ensures that an identical replica is deployed to each of the available nodes in the cluster.
This is different from deployment in many ways. In deployment, we specify the number of replicas in the manifest file as the desired number of replicas we need irrespective of the number of nodes in the cluster; however, DaemonSet deploys one replica to each of the available nodes except the node that is tainted (we will cover the concept of taints and toleration later in the book).
This is useful for many use cases, such as monitoring agents, logging agents, networking CNIs (Container Network Interfaces), etc., in which we need to process or gather some information from each of the running nodes.
You don’t need to update the replica based on demand; the DaemonSet takes care of it by spinning X number of pods for X number of nodes. If you create a DaemonSet in a cluster of five nodes, then five pods will be created by default. If you add another node to the cluster, a new pod will be automatically created on the new node.

DaemonSet in Kubernetes
A job is a type of Kubernetes object that creates one or more pods to perform a task. Once the tasks are completed, the pods are marked as completed, and the job tracks the successful completion of the task. Deleting a job will clean up the pods that it created.
backoffLimit: Number of retries before a job fails (two attempts).
restartPolicy: Never: The pod won’t restart on failure.
CronJob prints #40daysofkubernetes every five minutes.
DaemonSet ensures a replica is deployed on each node in the cluster. Unlike deployments, which require a specified number of replicas, DaemonSets automatically create one pod per node except tainted nodes (the concept of taints and tolerations will be covered later in the book).
Use cases for DaemonSets include monitoring agents, logging agents, networking CNIs, and any task requiring data collection or processing on all nodes.
A job creates one or more pods to perform a specific task and ensures completion. Once the task finishes, the pods are marked as completed, and deleting the job removes all associated pods. Jobs are suitable for tasks like data processing pipelines or something that can be triggered on an ad hoc basis.
CronJob schedules and runs jobs at specified intervals using cron syntax. Common use cases include regular backups, report generation, and periodic data processing.
In addition to learning about Kubernetes components, we have also studied pods, which are the smallest deployable unit. However, how are these control plane components configured in the cluster and managed? We will discuss static pods and scheduling in this chapter.
Static pods are a special type of pod in Kubernetes that are managed directly by Kubelet on each node rather than the Kube-Scheduler. Manifest files of static pods are placed directly on the node’s file system at a particular directory; for example, /etc/kubernetes/manifest is the default directory where the Kubelet watches these files.
Some examples of static pods are control plane components such as API Server, Kube-Scheduler, Controller Manager, ETCD, etc. If you remove the manifest file(s) from the directory, that pod will be deleted from the cluster.
You might be wondering why we are doing a docker exec. If you remember, nodes in our KinD cluster are nothing but the containers, and to enter into a container, we use the docker exec command.
To restart a component such as kube-scheduler, you can move the kube-scheduler manifest to a different directory, and you will see the kube-scheduler is not running anymore, which means new pods will not be assigned to the nodes. Newly created pods will be stuck in a pending state as the scheduler is down.
As soon as you move the manifest back to its original directory, the scheduler pod will start, and the pending workload pod will be created.
If the pod spec already has this field (nodeName), then the scheduler will not pick that pod for scheduling; the pod will be scheduled on the particular node irrespective of the scheduler health status.
Labels are the key-value pairs attached to Kubernetes objects like pods, services, and deployments. They help organize and group resources based on the specific criteria.
Annotations are similar to labels but attach non-identifying metadata to objects, for example, recording the release version of an application for information purposes or last applied configuration details, etc.
Taints are like putting up fences on the node(s), and only a certain type of pods has access to be scheduled on those node. A taint marks a node with specific characteristics, such as gpu=true. By default, pods cannot be scheduled on that tainted node unless they have a special permission called toleration. When a toleration on a pod matches with the taint on the node, then only the pod will be scheduled on that node.
Let’s look at that with the help of an example.
This command taints node1 with the key “gpu” and the effect “NoSchedule.” Pods without a toleration for this taint won't be scheduled there.
This pod specification defines a toleration for the “gpu” taint with the effect “NoSchedule.” This allows the pod to be scheduled on tainted nodes.
Labels group nodes based on size, type, environment, etc. Unlike taints, labels don't directly affect scheduling but are useful for organizing resources.
Another important point to remember is that taints and tolerations only restrict what type of workloads are allowed to a particular node, but that does not guarantee the scheduling of a particular pod on a specific node.
For example, there are two nodes, node1 and node2, and node1 is tainted with gpu=true, and we have created a pod named nginx with the toleration gpu=true. Node1 will only accept the nginx pod or any other pod that has the same toleration, but the nginx pod can also be scheduled on Node2, as the node is not tainted and can accept the pod based on other constraints such as resource requests and limits, capacity, affinity, and so on (we will look at these concepts later in the book).
You can use the nodeSelector field in your pod manifest to schedule a pod on a node with the labels provided in the selector field; however, if you need more control over the scheduling, in which you can indicate whether the rule is soft or preferred, you can use NodeAffinity in that case.
Node affinity lets you define the rules for your pods to be scheduled on a particular type of node based on the node labels. Taints provide the capability of a node to accept certain types of pods. Node affinity works similarly, but it provides the capabilities to the pods to go on a particular type of node for scheduling (the other way around).
requiredDuringSchedulingIgnoredDuringExecution: The scheduler can’t schedule the pod unless the rule is met.
preferredDuringSchedulingIgnoredDuringExecution: The scheduler will try to schedule the pod on a node that meets the rule. If a matching node is not available, the scheduler will still schedule the pod.
Here’s how it works: you define a list of required node labels in your pod spec, for example, disktype=ssd. Based on that, the scheduler tries to place the pod on the nodes with those exact labels; once scheduled, the pod remains on the node even if the label changes.
In your Kubernetes cluster, each pod requires a certain amount of resources, such as memory, CPU, GPU, etc., to function properly. Kubernetes allows you to define how much resources are required by a particular workload to operate normally (requests) and how much is the maximum it can use (limits).
Scheduling Decisions: Based on the specified requests and limits, the scheduler can take the decision whether a node has enough capacity to schedule that workload.
Node Safeguarding: If a container tries to use more resources than the limits, Kubernetes will perform CPU throttling; in the case of memory, it will kill the container with an OOM error (out of memory) to prevent the overconsumption of memory from the node that could result in node failure as well.
Minimize the Blast Radius: Limit protects your workloads from resource exhaustion by preventing a single container from occupying the resources that could have been used by other workloads.
Using the same values in multiple places inside the manifest YAMLs is a tedious and inefficient task; hence, Kubernetes provides an object called configmaps, using which you can take out similar environment variables from the manifest and store them in a separate object in the key-value pair. You can then inject the configmap into one or more pods instead of writing the same variables again and again.
Secrets are similar to configmaps but are specifically intended to hold confidential data. Secrets in Kubernetes are by default base64 encoded, not encrypted, meaning you can decode the secrets without needing any private key or certificate.
Secrets can contain sensitive data such as passwords, tokens, or an SSH key. It is recommended to keep the sensitive data away from the pod specification in a separate object, such as a secret, which can be used inside the pod by providing the reference to the secret object or mounting the secret as a volume on the pod.
Opaque Secrets: User-defined data in a key-value pair
Docker Registry Secrets: Docker credentials to pull images from Docker registries
Basic Auth: Secrets in the form of username and password
ssh-auth: SSH authentication secret
TLS: Data for a TLS client and server
Kubernetes Token: Bootstrap token data
Opaque is the default secret type if you don’t explicitly specify a type in a secret manifest.
The above command creates a secret of type kubernetes.io/dockerconfigjson. You can then retrieve the data from the secret and decode it using base64, as the secret will be stored as a base64-encoded string.
Static pods are managed directly by the Kubelet on each node rather than the Kubernetes Scheduler. Their manifest files are placed on the node’s file system, typically in /etc/kubernetes/manifests, where the Kubelet continuously monitors them.
Examples of static pods include control plane components such as API Server, Kube-Scheduler, Controller Manager, and ETCD.
Removing a manifest file from this directory deletes the corresponding pod from the cluster.
You can restart static pods by moving their manifest files out of the monitored directory and back.
Manual scheduling bypasses the Kubernetes Scheduler by specifying the nodeName field in the pod’s manifest.
While manual scheduling is not recommended for regular use, it is helpful for troubleshooting specific nodes or in case of a custom scheduler.
Labels and selectors facilitate organizing and filtering Kubernetes resources. Labels, defined as key-value pairs, are added to metadata and do not impact the application.
Selectors use these labels to query or manage subsets of resources, while annotations provide non-identifying metadata for informational purposes, such as versioning or configuration details.
Taints and tolerations control which pods can be scheduled on specific nodes.
A taint, defined as key=value:effect, prevents pods without matching tolerations from being scheduled on a node.
Taints and tolerations only restrict unsuitable pods but do not guarantee placement on a specific node.
Node affinity enables more granular control over scheduling using node labels. Rules can be strict (requiredDuringSchedulingIgnoredDuringExecution) or flexible (preferredDuringSchedulingIgnoredDuringExecution). For example, a pod with disktype=ssd required affinity will only schedule on nodes with that label. Node affinity complements taints and tolerations by enabling pods to select preferred nodes.
Resource requests and limits define the minimum and maximum resources a pod can use, such as CPU, memory, or GPU.
Properly configured requests and limits protect against resource exhaustion and optimize cluster performance.
configmaps and secrets manage application configuration and sensitive data, respectively.
configmaps store non-confidential key-value pairs for reuse across manifests, simplifying management.
Secrets store sensitive data like passwords or tokens, encoded in base64 to be injected into pods.
Kubernetes supports various secret types, including opaque (default), docker-registry credentials, and TLS data.
Secrets can be injected into pods as environment variables or mounted as volumes, keeping sensitive information separate from pod specifications.
Imagine you are an ecommerce enterprise and hosted your main web application on Kubernetes. You have 20 nodes running as per your predicted traffic during an event such as Black Friday or Cyber Monday; however, your traffic spike was 20 times more than you predicted. What could happen at that time?
Your workload starts failing, with business impact, user impact, revenue loss, goodwill impact, and so on.
You can try to add more nodes to the cluster during peak hours. But can you keep doing it throughout the event and during the next event and so on?
Also, it is not effective and requires huge manpower to execute those tasks seamlessly. These are one of the situations when we can use autoscaling, which adds more resources (CPU, memory, nodes, etc.) automatically as per the traffic requirements.
Autoscaling also helps in cost optimization by deleting extra resources when traffic utilization comes back to normal (off-peak hours).
Horizontal Scaling: Automatically adding/deleting resources to the existing infra (scale out/scale in)
Vertical Scaling: Automatically resize the server to a bigger/smaller size (scale up/scale down)

Autoscaling types in Kubernetes
HPA is a concept derived from horizontal scaling, which is when more pods are being added when the traffic increases (scale out) and pods are being deleted when traffic goes down (scale in). HPA is helpful when you can’t afford application downtime during the scaling; adding more pods would not impact the existing application.

Horizontal pod autoscaling (HPA) example
VPA is a concept derived from vertical scaling in which a bigger pod (more CPU/memory) would be automatically added and replace the existing pod (scale up). Once the traffic comes back to normal, the bigger pod is again replaced by a small pod (scale down). VPA is useful in cases when you can afford some disruption, even though the disruption would be minimal if you are rolling out changes in a rolling update or blue-green approach.

Vertical pod autoscaling example
The Metrics Server is an application that runs as a deployment on your Kubernetes cluster and helps collect the resource metrics from Kubelet. These metrics are then exposed to the Kubernetes API Server through the Metrics API. HPA and VPA consume these metrics and take the autoscaling decision.
To deploy the Metrics Server, you can follow the instructions given on the GitHub project: https://github.com/kubernetes-sigs/metrics-server.

CPU and memory utilizations per node

HPA setup using the kubectl command
The above command will keep performing wget on the application’s pod every few microseconds, resulting in increasing the load on the application.
Once you have done that, you can run the command kubectl get hpa –watch, and you can see the CPU utilization has drastically increased after a few minutes, which breached the threshold of 50% CPU utilization and added more pods to the deployment php-apache.

HPA in real time using --watch
Now, you can delete the load-generator pod, and you will see the newly added pods in terminating state till they reached the minimum number of replicas which is one.
Cluster autoscaler provides similar functionalities of HPA and VPA, but instead of autoscaling your pods, it does the autoscaling of your nodes. The Kubernetes Add-On Cluster Autoscaler adds or deletes new nodes for horizontal node autoscaling; it also upgrades the node to a bigger size for vertical node autoscaling.
It automatically manages the nodes in your cluster as per the traffic.
The feature of NAP mostly comes with a managed cloud service such as AKS, EKS, GKE, etc., in which new node pools (collection of similar types of nodes based on their resource requirement, size, machine type, etc.) are being added/deleted as per the demand.
How would Kubelet know when to restart an application, when an application is unhealthy, or when to add more pods to replace the unhealthy pods?
It does that by probing the application. This is referred to as health probes, in which the Kubelet keeps checking the application after a certain period and reports if it is healthy or not; based on that, it takes certain actions to recover from the failure.
Readiness Probe: Ensures that your application is healthy
Liveness Probe: Restarts the application if the health check fails
Startup Probe: Probes for legacy applications that need a lot of time to start

Health probes in Kubernetes
Readiness Probe: The readiness probe is used to determine if a container is ready to start accepting traffic. If the readiness probe fails, Kubernetes will temporarily remove the pod from the service’s load balancers, and it won’t receive any traffic until it passes the readiness check again.
We normally use readiness probes to prevent traffic from being routed to containers that are not yet ready to serve requests (during startup, initialization, or maintenance).
HTTP request
TCP probe
Readiness command
The YAML above demonstrates the readiness probe of type HTTP request, which uses port 8080 to probe the readiness container on path /healthz every 10 seconds (periodSeconds). However, it will wait 15 seconds (initialDelaySeconds) before beginning the probe.
Liveness Probes: The liveness probe is used to determine if a container is running. If the liveness probe fails, Kubernetes will kill the container, and it will be restarted according to the pod’s restart policy. It ensures that unhealthy containers are being terminated and replaced with healthy containers to maintain high availability and self-healing of the application.
HTTP request
TCP probe
Command based
gRPC probe
The YAML above demonstrates the liveness probe of type command, which uses the command car /tmp/healthy to probe the liveness container every five seconds (periodSeconds). However, it will wait five seconds (initialDelaySeconds) before beginning the probe. In simple words, it is checking the presence of the file (/tmp/healthy) and ensuring that the container is healthy when the file is present.
Autoscaling in Kubernetes ensures resource scalability and cost optimization by dynamically adjusting infrastructure based on traffic requirements. This eliminates manual intervention during unexpected traffic surges, such as during Black Friday events, and prevents business impact from application failures.
Horizontal Scaling: Adds or removes pods or nodes (scale out/in) to handle fluctuating traffic.
You should deploy Metrics Server to collect and expose resource metrics for HPA and VPA.
Vertical Scaling: Adjusts pod or node resources (scales up/down) to meet demand.
Cluster Autoscaling: Manages node-level scaling, adding or upgrading nodes automatically to meet demand. The cluster autoscaler is often used with cloud-managed services like EKS, AKS, and GKE.
Node Auto-provisioning (NAP): Automatically adds or deletes node pools based on traffic requirements.
Kubernetes uses health probes to maintain high availability and self-healing by monitoring application health.
Readiness Probe: Checks if a container is ready to receive traffic. It temporarily removes unhealthy containers from load balancers until they recover.
Liveness Probe: Ensures a container is running. If a probe fails, the container is restarted by the probe, based on the pod’s restart policy.
Startup Probe: Used for legacy applications with long initialization times, ensuring readiness before traffic routing.
In this chapter, we will explore Kubernetes manifest management tools—Helm and Kustomize—which simplify and streamline the deployment of complex applications. We will learn how Helm uses templating and charts to package, configure, and version Kubernetes resources, making it ideal for managing reusable and shareable application definitions. Kustomize, on the other hand, takes a patch-based, declarative approach that enables you to customize YAML manifests without duplication.
Helm is a package manager for Kubernetes, just like we have apt for Ubuntu, yum for Red Hat, and so on. When you have to install a package in Ubuntu, you go to the package manager repository and install the package using that. A package manager provides an easy way to install and manage software packages.
In Kubernetes, Helm provides similar functionality; for example, if you want to install Prometheus, you can just install it using Helm instead of going to the Prometheus website and downloading its binaries or installables. Helm is also a CNCF (Cloud Native Community Foundation) project that provides you with fantastic community support.
Helm has three main components:
Helm Chart: A package in Helm is called a chart or a Helm chart, like an RPM file or a dpkg file. It’s a bundle of tools, binaries, and dependencies with multiple deployment manifests.
Repository: A central storage for chart management like Docker Hub.
Release: An instance of a running chart; for example, we have containers in Docker, which is a running instance of a Docker image.
Charts are reusable, and you can install a single chart multiple times, and each time it will create a new release. You can also search Helm charts in the Kubernetes repository.
You can update values.yaml as per your needs, and you can use the single chart for multiple environments, and each time it will create a new release. You can also pass the custom values.yaml for each environment.
In the previous topic, we learned about the package manager for Kubernetes (Helm) and understood how it uses templating for simplified management of your Kubernetes and bundles all the files into something known as a Helm chart. Kustomize is a similar tool; however, it provides additional functionalities such as managing and organizing application configurations across different environments without using the templating engine functionality of Helm.
Kustomize takes a fundamentally different approach to configuration management compared to other tools like Helm. Instead of using templates and variables, it uses a layered approach that builds upon your existing YAML manifests. This makes it particularly appealing for beginners as it requires minimal learning of new concepts while providing powerful customization capabilities.
Helm uses a declarative approach in which you can define a base configuration (common configuration) that can be used for all the environments and separate configuration (overlays) that you can define for each of the environments, providing you capabilities to make changes for each of the environments without changing the common configuration. You can keep your common changes (that need to be applied for all the manifests, such as common tags, common environment variables, etc.) and keep the environment-specific configuration in their dedicated directories (such as environment name, labels, resource requests, limits, etc.).
What makes Kustomize particularly accessible is its integration with kubectl. You don’t need to install additional tools—it’s already there, within the kubectl utility.
You start a simple kustomization.yaml file, which acts as your main configuration file.
resources: You can define all your manifest YAMLs here.
commonLabels: The common labels that you wish to be applied to all the resources.
commonAnnotations: The common annotations that you wish to be applied to all the resources.
namePrefix: A valid prefix that will be applied to the resource’s name.
nameSuffix: A valid suffix that will be applied to the resource’s name.
where -k stands for Kustomize. To apply any manifest to the Kubernetes cluster, we used kubectl apply -f <filename>; there is a slight difference while applying the kustomize manifest; it looks for the file customization.yaml in the current directory as we have used a dot (.)
Kustomize also provides you the capabilities of generating configmaps from external files, which allows you to separate Kubernetes configuration data from Kubernetes manifests as per best practices.
Update the reference of the configmap name inside the manifest YAML.
The true strength of Kustomize shines in managing multiple environments. By organizing your configurations into base and overlay directories, you create a clear hierarchy of configurations.
This structure allows you to maintain a single source of truth (base) while specifying environment-specific variations/customizations through overlays.
Development-specific deployments should be deployed inside the dev namespace.
Stage-specific deployments should be deployed inside the stage namespace.
For dev deployment, replicas should be two.
For stage deployment, replicas should be four.
For the file overlays/dev/kustomization.yaml, you start by giving reference to the base folder under the base’s top-level field and the fields that you wish to override, for example, namespace. You can also include custom fields such as replicas for each environment and create a separate replicas.yaml that has the configuration till the replicas field, and then you perform the same actions for the stage folder as well.
Patches in Kustomize allow you to make delta modifications to your base configurations. This is particularly useful when you need to make environment-specific adjustments without duplicating entire configuration files.
Kustomize | Helm | |
|---|---|---|
Native integration | Can be used from kubectl | No—installed separately |
Ease of use | Beginner-friendly | Complex |
Approach | Overlays | Template based |
Mode | Declarative | Imperative |
Bundling/packaging | No | Yes |
Versioning/rollbacks | No | Yes |
Helm is a package manager for Kubernetes, similar to apt or yum, enabling simplified deployment and management of Kubernetes applications.
Helm uses charts (bundles of YAML files, templates, and dependencies), repositories (central storage for charts), and releases (instances of running charts).
Helm streamlines multi-environment deployment by allowing reusable charts with customizable values.yaml configurations, making application management efficient and scalable.
Kustomize, in contrast, provides declarative configuration management without templating.
It focuses on layering and organizing application configurations for multiple environments.
Using kustomization.yaml, Kustomize centralizes common settings (e.g., labels, annotations) and allows environment-specific overrides through overlays.
This approach ensures a single source of truth with minimal duplication and enables configuration separation, such as managing configmaps and hierarchical structures for different environments.
Both tools simplify Kubernetes application management but serve different use cases—Helm specializes in templating and reuse, while Kustomize focuses on declarative configuration and environment-specific adjustments.
In this chapter, we will explore authentication and authorization in Kubernetes, two critical mechanisms that secure access to your cluster. You’ll learn how Kubernetes verifies user identities through various authentication methods such as certificates, tokens, and external identity providers. We will then cover how authorization determines what authenticated users are allowed to do, focusing on mechanisms like RBAC (role-based access control), ABAC (attribute-based access control), node, and webhooks. By the end, you will understand how to implement fine-grained access control to protect your cluster and ensure only the right users and services can perform specific actions.
In a client-server architecture such as Kubernetes, to be able to access or manage the server, the client should be able to authorize and authenticate.
Authentication is a way of validating the identity of a user (human user or service account) making a request to the API Server. Any user that presents a valid certificate signed by the cluster’s certificate authority (CA) is considered authenticated. For service accounts, Kubernetes uses service account tokens that are automatically generated when the account is created. In simple terms, authentication is a way of validating the user’s identity against who they say they are.
Once the user is authenticated, the next step is to validate the level of access they have on the Kubernetes cluster and resources (i.e., what they are allowed to do and what they cannot do).

Authorization types in Kubernetes
AlwaysAllow: All requests are allowed (a huge security risk).
AlwaysDeny: All requests are denied by default.
ABAC: Attribute-based access control.
RBAC: Role-based access control.
Node: Authorize Kubelet for certain actions on the node.
Webhook: Event-based authorization through a webhook REST call.
RBAC in Kubernetes is a method for regulating access to the Kubernetes API. It allows you to specify who can access what resources within a Kubernetes cluster and what actions they can perform on those resources based on certain roles that they get assigned to. Instead of assigning permissions to individual users, it is easier to group related permissions together in a role and assign the role to a user or group.
Role: Defines a set of permissions within a namespace. It contains rules that represent allowed operations on Kubernetes resources.
RoleBinding: Grants the permissions defined in a Role to a user, group, or service account within a specific namespace.
ClusterRole: Similar to a Role, but the ClusterRole is a cluster-scoped resource. It can be used to define permissions across all namespaces or for cluster-scoped resources.
ClusterRoleBinding: Similar to RoleBinding, but it grants the permissions of a ClusterRole to a user, group, or service account across the entire cluster.

Role-based access control in Kubernetes
Create a CertificateSigningRequest.
Figure 12-4 shows the certification creation and approval process in Kubernetes.

Certification signing request creation and approval process
(This should return the output as no.)
(This should return the output as yes.)
Why? Because we only granted the get/list pod access and not the service access.
(This should return the output as yes.)
While using Kubeconfig, you don’t have to specify the client-key, certificate, etc.; kubeconfig takes care of them.
There are two types of accounts in Kubernetes that interact with the cluster. These could be user accounts used by humans, such as Kubernetes admins, developers, operators, etc., and service accounts primarily used by other applications/bots or Kubernetes components to interact with other services.
Then you can add role and rolebinding to grant access. Kubernetes also creates one default service account in each of the default namespaces such as kube-system, kube-node-lease, and so on.
Kubernetes uses authentication and authorization mechanisms to secure access to the cluster.
Authentication verifies the identity of a user (human or service account) interacting with the cluster. Users authenticate through certificates signed by the cluster’s certificate authority (CA), while service accounts use automatically generated tokens.
Authorization defines what actions authenticated users can perform on the cluster.
AlwaysAllow/AlwaysDeny: Permits or blocks all requests (not recommended)
ABAC: Attribute-based access control
RBAC: Role-based access control
Node: Access that Kubelet gets to perform certain actions on a node
Webhook: Webhook-based authorization
Role/RoleBinding: Namespace-specific permissions and bindings.
ClusterRole/ClusterRoleBinding: Cluster-wide access and bindings.
Access is managed via kubeconfig files containing credentials and certificates, which ensure secure interactions with the Kubernetes API.
Network policy allows you to control the inbound and outbound traffic to and from the cluster. For example, you can specify a deny-all network policy that restricts all incoming traffic to the cluster, or you can create an allow network policy that will only allow certain services to be accessed by certain pods on a specific port.
CNI stands for Container Network Interface. It’s a standard for configuring network interfaces in Linux containers, used by container orchestrators like Kubernetes. CNI provides a framework for plugins to manage container networking, allowing different networking solutions to be easily integrated. To implement a network policy in a Kubernetes cluster, you need to have CNI plugins installed, as it does not come with a vanilla Kubernetes installation.
Weave-net
Flannel and Kindnet (doesn’t support network policies)
Calico
Cilium
CNI is deployed as a DaemonSet; hence, CNI pods will be running on each node in the cluster.
To install a CNI plugin such as Calico, you can follow the below documentation:
https://docs.tigera.io/calico/latest/getting-started/kubernetes/kind
For instance, we have to restrict the access within a Kubernetes cluster in which only backend pods should be allowed to access the database pods, and other pods, such as frontend ones, should not have access to the backend pods.

Network policy sample that only allows the backend pod to access the my-sql pod
PolicyTypes: Ingress: Valid values are Ingress and Egress to control inbound and outbound access, respectively. With Ingress, we have provided an additional rule that matches the label with the pod label. In this example, we are allowing the pod with the label role:backend to have inbound access to the pod on which this network policy is attached.
PodSelector: This field is responsible for attaching the network policy to the pod with the matching label name: mysql.
Log in to the frontend pod using kubectl exec and try doing a curl on the MySQL service; this curl should throw an error.
Log in to the backend pod using kubectl exec and try doing a curl on the MySQL service; this curl should show a successful response.
If you are facing issues, check the above steps along with the Calico health status.
Network policies in Kubernetes control inbound and outbound traffic to and from the cluster. They allow you to define rules, such as restricting all incoming traffic or permitting access to specific services on designated ports.
Network policies require a Container Network Interface (CNI) plugin for implementation, as Kubernetes does not provide this functionality by default.
CNI is a standard for configuring network interfaces in containers, enabling seamless integration of various networking solutions.
Popular CNI plugins include Calico, Weave Net, and Cilium, with some (e.g., Flannel, Kindnet) not supporting network policies.
CNI is deployed as a DaemonSet, ensuring networking functionality across all cluster nodes.
A common use case is restricting access between pods, such as allowing only backend pods to access database pods while blocking other pods (e.g., frontend pods).
Network policies use labels and selectors to enforce these rules.
Create a deployment that initially has two replicas and uses nginx as a container image, then scale the deployment to four replicas using the kubectl command.
Expose the deployment as a nodePort service on port 8080.
Check for the pods that have label env:demo and redirect the pod names to a file pod.txt.
Create an nginx pod; ensure it is running. Edit the pod and add an init container that uses a busybox image and run the command sleep 10;echo “hello world”.
Create a pod and force schedule it on worker node 01.
Create a multi-container pod with the images as redis and memcached.
Implementing StorageClasses and dynamic volume provisioning
Configuring volume types, access modes, and reclaim policies
Managing PersistentVolumes and PersistentVolumeClaims
In one of the previous chapters, we did the Kubernetes installation on KinD as it is lightweight and easy to set up. While KinD is an ideal choice for local development and for learning purposes, it does not provide the capabilities of a full-fledged production-grade Kubernetes cluster. In this chapter, we will perform the Kubernetes installation using the Kubeadm tool.
Kubeadm is a tool to bootstrap the Kubernetes cluster, which installs all the control plane components (API Server, ETCD, controller manager, and scheduler) as static pods and gets the cluster ready for you. You can perform various tasks such as node initialization, node reset, joining worker nodes with control plane nodes, etc.
Till now, we were using a KinD cluster, but now we will create a fresh multi-node cluster on cloud virtual machines using Kubeadm.
High-level steps of the installation will be as follows.

Kubernetes installation steps using Kubeadm
For this step, you can use a virtualization software (VirtualBox, Multipass, etc.) that is able to create three virtual machines, or you can use virtual machines using any cloud provider.
In this book, I will be using Amazon EC2 servers for this purpose. You can go to the AWS console and provision three EC2 servers, one for master nodes and two for worker nodes.
Security groups in AWS restrict access on certain ports to and from certain sources and destinations. In Kubernetes, different components will be communicating with each other on certain ports; hence, we need to allow the access as below.

Ports required for communication between Kubernetes components
Protocol | Direction | Port Range | Purpose | Used By |
|---|---|---|---|---|
TCP | Inbound | 6443 | Kubernetes API Server | All |
TCP | Inbound | 2379–2380 | ETCD server client API | kube-API Server, ETCD |
TCP | Inbound | 10250 | Kubelet API | Self, control plane |
TCP | Inbound | 10259 | kube-scheduler | Self |
TCP | Inbound | 10257 | kube-controller-manager | Self |
TCP | Inbound/outbound | 179 | Calico networking | All |
Protocol | Direction | Port Range | Purpose | Used By |
|---|---|---|---|---|
TCP | Inbound | 10250 | Kubelet API | Self, control plane |
TCP | Inbound | 10256 | kube-proxy | Self, load balancers |
TCP | Inbound | 30000–32767 | NodePort services | All |
TCP | Inbound/outbound | 179 | Calico networking | All |
Disable source/destination checks for master and worker nodes from the EC2 console.
Note APIServer-advertise-address is the private IP of the master node.
SSH into the worker nodes and perform steps (1–7) on both nodes.
If all the above steps were completed, you should be able to run kubectl get nodes on the master node, and it should return all the three nodes in ready status.
Also, make sure all the pods are up and running by using the command as follows: kubectl get pods -A.
If you are running a self-hosted production Kubernetes cluster, you should be setting up the high availability for control plane nodes as well. This part is not required if you are using a managed Kubernetes service such as Azure Kubernetes Service (AKS), Elastic Kubernetes Service (EKS), or Google Kubernetes Engine (GKE); however, you need to take care of these steps in the case of a self-hosted cluster.
With Stacked Control Plane Nodes: You create multiple control plane nodes with an API Server and ETCD.
With an External ETCD Cluster: You create separate nodes for the API Server and separate nodes for ETCD members; this approach will need more infrastructure; hence, more cost associated with it.
In this chapter, we will focus on stacked control plane nodes.
At least three machines for control plane nodes that meet Kubeadm’s minimum requirements, such as a supported OS, 2GB RAM, two CPUs, etc. An odd number of machines helps with leader selection in the case of host or zone failure.
At least three machines for worker nodes that meet Kubeadm’s minimum requirements, such as a supported OS, 2GB RAM, two CPUs, etc.
All required ports are open between nodes.
Kubeadm, kubelet, and kubectl installed on all nodes.
Container runtime installed and configured.
All machines have access to each other on a network.
Superuser privileges on all machines (sudo or root access).
The first critical component in your HA setup is a properly configured TCP forwarding load balancer for the API Server. This load balancer will act as the front door for all incoming requests and redirect the traffic to the API Servers as the backend on port 6443.
We need to ensure the load balancer can communicate with all control plane nodes on the API Server port and its address matches Kubeadm’s ControlPlaneEndpoint.

Sample Kubernetes stacked control plane architecture for high availability
Initially, you’ll receive a “connection refused” error since the API Server isn’t running. A timeout indicates a load balancer configuration issue that needs immediate attention.
Add the remaining control plane nodes to the load balancer’s target group.
Initializing the First Control Plane Node
--control-plane-endpoint: Specifies your load balancer’s DNS and port.
--upload-certs: Enables automatic certificate distribution across control plane nodes.
Optional: Use --pod-network-cidr if your CNI plugin requires it.
Certificates are crucial for cluster security. When using --upload-certs, certificates are encrypted and stored in the kubeadm-certs secret. The decryption key and kubeadm-certs secret expire after two hours by default.
Install the CNI Plugin
You can now install the CNI plugin, such as Calico/Cilium, etc., as per your requirements. If you want to install Calico, the steps used in the previous chapter can be followed here.
Steps for Each of the Control Plane Nodes
You can also run the join command parallelly from multiple nodes.
Steps for Worker Nodes
In this chapter, we discussed the process of setting up a Kubernetes cluster using Kubeadm.
Kubeadm is a powerful tool for bootstrapping Kubernetes clusters by installing control plane components like API Server, ETCD, Controller Manager, and Scheduler as static pods.
Kubeadm also facilitates tasks such as node initialization, resetting nodes, and joining worker nodes to the control plane.
The installation process includes provisioning three virtual machines (one master and two worker nodes) using cloud providers like AWS or virtualization software.
Security groups must be configured to enable component communication, and source/destination checks should be disabled.
Key setup steps include disabling swap, configuring networking, and installing necessary components such as the container runtime, CNI plugins, Kubeadm, Kubelet, and Kubectl.
After initializing the control plane and setting up the kubeconfig, a pod network like Calico is deployed for networking as a DaemonSet.
Worker nodes are prepared with similar configurations and joined to the cluster using a token.
Once all steps are completed, the cluster can be validated to ensure all nodes and pods are running correctly.
Storage in Kubernetes is handled by something known as PersistentVolume and PersistentVolumeClaim. In simple words, a storage admin creates the volume, which is the representation of a physical storage that can be consumed by other users and applications; this volume is called a PersistentVolume.
The PersistentVolume has a storage capacity, access mode, StorageClass, etc., using which a user or an application can request a slice of this volume by creating an object called a PersistentVolumeClaim.

PersistentVolumes and PersistentVolumeClaims in Kubernetes
To successfully bind a PersistentVolumeClaim (PVC) with the PersistentVolume (PV), you need to make sure that their properties, such as access mode and StorageClass, match with each other, and the request capacity should be less than or equal to the available capacity.
Static provisioning
Dynamic provisioning
A cluster/storage administrator creates the PersistentVolume(s), which are available to be consumed by users and applications and exist in the Kubernetes API.
When the available PVC does not match the requested PVCs, the cluster may dynamically provision a volume for that matching PVC. This provisioning can be done through a StorageClass that should already exist in the cluster.
StorageClass is helpful to provision the PV based on the certain storage and performance requirements.
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using a StorageClass.
Lifecycle: PVs exist independently of the pods that use them and can be reused or reclaimed as per the defined policies.
Attributes: Includes information about the storage capacity, access modes, and the StorageClass it belongs to.
Reclaim Policy: Defines what happens to the volume when it is released by a PVC (e.g., retain, delete, or recycle).
PersistentVolumeClaim (PVC) is a request for storage by a user or application. It is used to request a specific amount of storage with certain attributes (like access modes) from available PersistentVolumes.
Request: Allows users to specify the amount of storage required and the access mode, for example, 10GB.
Binding: PVCs are bound to available PVs that match the requested criteria. If a suitable PV is found, it will be bound to the PVC.
Dynamic Provisioning: If no suitable PV is available, the PVC may trigger the creation of a new PV based on the StorageClass.
In Kubernetes, access modes and reclaim policies are key attributes for managing PersistentVolumes (PVs) and PersistentVolumeClaims (PVCs). They define how storage resources are accessed and managed within the cluster.
ReadWriteOnce (RWO): This is useful for applications that need to write data and can operate from a single node, such as a single-instance database or an application server.
ReadOnlyMany (ROX): Suitable for scenarios where multiple nodes need to read data from the volume but not write to it, such as serving static content or configuration files.
ReadWriteMany (RWX): This mode is used for applications that require concurrent read and write access from multiple nodes, such as distributed file systems or shared data storage solutions.
Retain: The PV is retained and not deleted after the PVC is deleted. The volume will remain in the cluster and must be manually reclaimed by an administrator.
Delete: The PV and its associated storage are deleted when the PVC is deleted. This is often used with dynamically provisioned volumes where the underlying storage is managed by the cloud provider or storage system.
Recycle (Deprecated): The PV is scrubbed and made available for reuse when the PVC is deleted. Scrubbing typically involves deleting the data on the volume before it is made available for new claims.
If you run a kubectl describe on the PVC, it should show the status as bound; if that is not the case, you need to check accessModes, storage, etc., or check the events printed by the command.
Create the pod to consume the volume.
StorageClass provides a way to describe the “classes” of storage offered by cluster administrators. Different classes may offer different quality-of-service levels, backup policies, or arbitrary policies determined by cluster administrators.
StorageClass enables dynamic volume provisioning, allowing storage volumes to be created on demand. Without StorageClass, cluster administrators would need to manually provision PersistentVolumes.
You might have noticed many additional fields that we have not discussed yet; let’s understand these.
provisioner: ebs.csi.aws.com indicates this StorageClass uses the AWS EBS CSI driver for provisioning storage volumes. This provisioner creates AWS EBS volumes when PVCs request this StorageClass.
Reclaim Policy determines what happens to a PersistentVolume (PV) when its associated PersistentVolumeClaim (PVC) is deleted; we have already discussed reclaim policy in this chapter.
allowVolumeExpansion:true enables the ability to expand volumes after creation and allows users to increase PVC size without recreating the volume.
parameters describe volumes belonging to the StorageClass. Different provisioners support different parameters that can be used with the StorageClass, if you don’t specify the parameter, some default values will be used based on the provisioner.
volumeBindingMode defines when the volume binding and dynamic provisioning should happen. Supported values are Immediate (default value) and WaitForFirstConsumer. Immediate mode guarantees immediate volume binding as soon as it is provisioned; however, WaitForFirstConsumer delays volume binding until a pod using the PVC is created.
Topology Constraints restricts volume provisioning to specific zones.
List all the StorageClasses in your cluster:
The default StorageClass should be marked as (default).
Patch the Annotation
Set the Annotation on a Different StorageClass
Kubernetes handles storage through PersistentVolumes (PVs) and PersistentVolumeClaims (PVCs).
PersistentVolume (PV): Represents physical storage provisioned by an admin or dynamically created using a StorageClass. It includes attributes like storage capacity, access modes, and reclaim policies (e.g., retain, delete, or recycle).
PVs exist independently of pods and can be reused or reclaimed as per policies.
PersistentVolumeClaim (PVC): A request for storage by users or applications. PVCs specify required attributes like capacity and access modes. They bind to suitable PVs or trigger dynamic provisioning through StorageClass when no matching PV is available.
Static Provisioning: Admins pre-create PVs for use.
Dynamic Provisioning: Kubernetes creates PVs on demand using a predefined StorageClass based on the requirements.
StorageClass provides a way to describe the classes of storage offered by cluster administrators. If the storageClass field is omitted from the PVC manifest, the default StorageClass will be used to provision the volume.
You can change the default StorageClass by patching the annotation on the StorageClass object.
Access mode: ReadWrite Once
Storage: 250Mi
You must use the existing, retained PersistentVolume (PV)
Update the deployment to use the PVC you created in the previous step
Create a PV with 1Gi capacity and mode readWriteOnce and no StorageClass; create a PVC with 500Mi storage and mode as readWriteOnce; it should be bounded with the PV. Create a pod that utilizes this PVC and use a mount path /data.
Create a PVC with 10Mi, mount this PVC to the pod at /var/new-vol. Now, edit the PVC and increase the size from 10Mi to 50Mi.
Create a sample StorageClass and update it to become the default storage class.
Understanding host networking configuration on cluster nodes
Understanding connectivity between pods
Understanding ClusterIP, NodePort, and Load Balancer service types and endpoints
Knowing how to use Ingress controllers and Ingress resources
Knowing how to configure and use CoreDNS
Choosing an appropriate Container Network Interface plugin
Defining and enforcing network policies
Using the Gateway API to manage Ingress traffic
This chapter provides a foundational understanding of Kubernetes networking, a critical aspect of how pods communicate within a cluster and with the outside world.
Container-to-Container Communication: Done by pods and localhost communications.
Pod-to-Pod Communication: We will be mostly discussing this part in this chapter.
Pod-to-Service Communication: Done by services, covered in Chapter 6.
External-to-Service Communication: This is also done by services.

Communication within a Kubernetes cluster
In Kubernetes, multiple applications share multiple machines (nodes); however, we could come across the issue where multiple applications are trying to fight for the same port. Each of the control plane components runs on a dedicated predefined port; however, user workloads do not follow any such semantics. It is crucial to have a mechanism that controls the behavior of port allocation and makes sure they don’t coincide with each other on a cluster.
The Network plugin is responsible for assigning IP addresses to pods.
Kube-API Server is responsible for assigning IP addresses to Kubelets.
Kubelet (or Cloud Controller Manager in the case of a managed cloud service) is responsible for assigning IP addresses to nodes.
To implement the networking model, the container runtime on each node uses a Container Network Interface (CNI) plugin to manage the security and networking aspect of the cluster by creating an interval virtual network overlay. Pods within a Kubernetes cluster can communicate with each other using their internal IP addresses.
There are a wide range of CNI plugins available from many different vendors and can be used based on the requirements. For example, CNI plugins such as Flannel do not support network policy implementation; for that, you can use Calico or Cilium.
All the supported CNI plugins can be found here:
https://kubernetes.io/docs/concepts/cluster-administration/addons/#networking-and-network-policy
CNI defines the standard and specifications of how these plugins can be created by making use of the available libraries in the Go source code. It also provides a template for making new plugins and a separate repository containing the reference plugins.
You can check the CNI open source project over here: https://github.com/containernetworking/cni.
CoreDNS is the default Domain Name System (DNS) management service for Kubernetes. Before Kubernetes version 1.21, kube-dns was the default service, which has now been replaced by CoreDNS.
When you create a service in Kubernetes, CoreDNS is responsible for mapping the IP address to the service hostname so that the service can be accessed using its hostname within the cluster without using the IP address.
CoreDNS runs as a deployment in the kube-system namespace exposed through a service of type clusterIP and named kube-dns.
where 10.0.0.10 is the service IP address for the CoreDNS deployment. If you were not using a DNS server such as CoreDNS, you need to add a mapping inside /etc/hosts file of the pod to allow the access from one pod to another. If you are using CoreDNS, you don’t need to perform this step.
After applying the manifest, your dnsutils pod should be provisioned inside the default namespace.
Finally, check the CNI plugin if all the pods are up and healthy.
Ingress helps route HTTP and HTTPS traffic from outside the cluster to the services within the cluster. It works in a similar way to services (NodePort and Load Balancer); however, it adds an additional routing layer on top of the service that does the rule-based routing.
You define the rules inside your Ingress resource; based on the rules, the traffic will be routed to the service inside the cluster.
Cloud Vendor Lock-In: If you are using a cloud provider such as AWS and create a service of type load balancer, the Cloud Controller Manager (CCM), which is a component of the cloud provider and interacts with your API Server, creates an external load balancer; hence, this approach is not feasible if you are on-premises or using your own data center.
Costly Solution: As you are using a load balancer service, your CCM will provision a load balancer for each of the services that you define in your cluster, making it a costly solution.
Security: You need to use the cloud provider’s built-in security for the load balancer.
Ingress Resource: A Kubernetes object that defines routing rules
Ingress Controller: The implementation that enforces these rules, such as the Nginx Ingress controller that watches your Ingress resource and manages the load balancer
Load Balancer: Created and managed by the Ingress controller based on the Ingress resource YAML

Ingress in Kubernetes
To set up Ingress, you should be creating an Ingress resource, an Ingress controller that watches the Ingress resource, and a Load Balancer that is being created and managed by the Ingress controller.
To use an Ingress resource, you should create an Ingress controller such as Nginx that watches your Ingress resource. Using Ingress without the Ingress controller would not work and has no effect.
For example, you can create an Ingress-Nginx controller using the below steps:
An Ingress resource could be created that accepts the incoming traffic from the client and routes it to the backend service(s) based on the defined routing rules. Think of it as a traffic controller that sits at the edge of your Kubernetes cluster, directing incoming requests to the appropriate services based on configured routing rules.
In the above example, we are creating a resource of type Ingress that is based on the nginx Ingress class, which would have been already created with the Ingress controller Helm chart. If you edit or run a describe on the Ingress controller pod, you find the value of the default ingress class name, such as nginx. This is how the controller will know which resource it has to watch.
The YAML also has a routing rule that says if anyone tries to access the application on the path example.com/, redirect the traffic to the service named hello-world on port 80.
The annotation nginx.ingress.kubernetes.io/rewrite-target: / will make sure that the incoming requests received on the path (rules→http→paths→path) / will be forwarded to the path defined in the rewrite target. For example, the path contains /web, and the containers have been configured to serve the web page from the path /var/html/www/, so you can create a rewrite rule that forwards the request from /web to /var/html/www inside the container.
If you are using a KinD cluster or Kubeadm cluster, the Ingress controller will not create a load balancer resource, and your Ingress will not get an external IP address because this is done by the Cloud Controller Manager, which is a part of managed cloud services such as AKS, EKS, or GKE.
In this example, the app listening on the /app1 path will be redirected to the app1 service on port 80, and the app listening on the /app2 path will be redirected to the app2 service on port 80 using the nginx ingress class.
In this example, the traffic listening on foo.example.com on the / path will be redirected to the backend service called foo-service on port 80, and the traffic listening on bar.example.com will be redirected to the backend service named bar-service listening on port 80.
We will now look at the most common issues we face related to Kubernetes networking and how to troubleshoot in those scenarios.
The Kubernetes Gateway API represents a significant improvement in how we manage external access to services within our clusters. As a more sophisticated successor to Ingress, it provides enhanced traffic routing capabilities along with dynamic infrastructure provisioning.
GatewayClass: Defines the type of load balancing implementation and is managed by a controller that implements the class
Gateway: Represents the actual load balancer instance that accepts and handles the traffic
HTTPRoute: Defines actual routing rules for mapping traffic from a Gateway to the backend network endpoints (services)
Gateways can be implemented by various controllers, each with distinct configurations. A Gateway must refer to a GatewayClass that specifies the controller’s name implementing that class.
In this example, a controller that has implemented the Gateway API is configured to manage GatewayClasses with the controller name example.com/gateway-controller.
A gateway is a front door of your application hosted on Kubernetes that is responsible for receiving the traffic and further filtering, balancing, splitting, and forwarding it to backends such as services. It could act as a cloud load balancer or an in-cluster proxy server.
In this example, a Gateway resource has been created to listen for HTTP traffic on port 80. The implementation’s controller will be responsible for assigning the address or hostname to the gateway.
In this example, HTTP traffic originating from the Gateway named production-gateway, which has the Host:header set to www.production.com and a request path of /api, will be directed to the service example-svc on port 8080.

Gateway resource in Kubernetes
In this example, the client sends an HTTP request to the host http://www.production.com. After the DNS resolution happens, the Gateway accepts the incoming request and uses the Host:header to match the configuration from the Gateway and attached HTTPRoute. The request will then be forwarded to one or more backend pods through the service.
Kubernetes networking enables communication across various layers, such as container-to-container, pod-to-pod, pod-to-service, and external-to-service.
Key components responsible for IP allocation: Kube-API Server, which assigns IPs to services; Kubelet/Cloud Controller Manager, which assigns IPs to nodes; and CNI Plugin, which manages IP allocation for pods.
CNI plugins facilitate secure communication between pods using an internal virtual network.
Various CNI plugins (e.g., Flannel, Calico, Cilium) are available, each catering to different network and security needs.
CNI defines standards for network plugins, ensuring interoperability and extensibility.
CoreDNS is the default DNS service (replacing kube-dns since Kubernetes 1.21).
CoreDNS maps service hostnames to IP addresses, simplifying internal communication without needing manual /etc/hosts entries.
Ingress manages HTTP/HTTPS traffic routing from external clients to services within the cluster.
Preferred over Load Balancer services due to less dependency on cloud providers, cost efficiency by eliminating the need for multiple external Load Balancer resources, and high security and customizability with routing rules and SSL/TLS termination.
Ingress resources define routing rules; Ingress controllers (e.g., Nginx) enforce them.
Ingress requires both an Ingress resource (defines routing rules) and an Ingress controller (enforces these rules).
The Kubernetes Gateway API represents a significant evolution in managing external traffic to cluster services. Unlike its predecessor Ingress, the Gateway API offers a more improved and flexible approach to traffic routing with enhanced control over infrastructure provisioning.
The Gateway API architecture consists of three stable components: GatewayClass, Gateway, and HTTPRoute.
GatewayClass acts as the blueprint for load balancer implementations, similar to how StorageClass defines storage implementations. Gateway functions as the actual load balancer instance, managing incoming traffic, while HTTPRoute defines the specific rules for routing traffic from the Gateway to backend services.
The typical traffic flow through the Gateway API follows a logical progression where an external client makes a request, DNS resolves to the Gateway’s address, the Gateway accepts and processes the incoming request based on HTTPRoute rules, and finally the traffic reaches the backend services and pods.
The Gateway API provides a more structured approach to traffic management compared to Ingress. Configuration changes in HTTPRoute automatically update Gateway routing rules, and the API supports complex routing scenarios while maintaining simplicity in basic configurations.
In Kubernetes, there are out-of-the-box resources and objects available, such as pod, deployment, configmap, secrets, and many more; however, Kubernetes gives the ability to extend the Kubernetes API and create a new resource type other than what is available. Why would you want to create a new resource type? If you have a use case that has not been covered by any of the existing resource types, you can create a new resource as per your specific requirements. For instance, you need to implement GitOps within Kubernetes; in such cases, you can create your own controller.
CRD (Custom Resource Definition): Defining a new type of API to Kubernetes. It’s a template that enforces what all fields are supported for the resource and its format.
CR (Custom Resource): Kubernetes validates a CR against the CRD to create your resource and creates the resource in Kubernetes if everything is good.
Custom Controller: To manage the lifecycle of custom resource.
When you create a new deployment, Kubernetes validates your YAML or manifest with the resource definition of deployment, whether you have used the supported fields in the correct format or not. Similarly, in the case of CR, Kubernetes will match your manifest with the CRD to validate the YAML with the template.

Customer controller, custom resource, and custom resource definition in Kubernetes
A Kubernetes administrator or a DevOps engineer creates a custom resource definition to implement a custom resource. Then they implement a custom controller that watches the specific custom resource. When a user creates a custom resource, it is being validated against the CRD and being watched by a custom controller to perform certain actions and to manage the lifecycle of the CR.
Custom controllers can be created in any of the supported languages, such as Go, Python, or Java; however, Go is preferred.
Examples of such popular controllers are Prometheus, Crossplane, Fluentd, Istio, ArgoCD, and many more.
In the previous chapter, we have learned how a custom controller manages a Kubernetes custom resource’s lifecycle. A Kubernetes operator is used to bundle, package, and manage your custom controllers.
For example, if you are creating a custom controller for Prometheus, a Prometheus operator is used to package the Prometheus custom controller along with its CRs and CRDs and to manage it. When a user tries to perform the Prometheus installation, they can do it via its YAML manifest or Helm chart or even through an operator.
Installation and Abstraction: A Kubernetes operator enables us to treat the application as a single object bundle exposing only the necessary adjustments instead of various Kubernetes objects that are managed separately.
Reconciliation: In the previous chapter, we have already seen the Helm chart, which contains templates, charts, values, YAML, etc., to install software packages such as Prometheus. However, it doesn’t provide the mechanism for reconciliation, which means someone who has access to the cluster can update the deployment manifest and apply the changes to the cluster without modifying the Helm chart or values inside it. If you are using an operator, it has a reconciliation logic that continuously watches the live objects and makes sure it matches the state as created by the operator by automatically scaling, updating, or even restarting the application.
Automation: A Kubernetes operator can be used to automate complex tasks that are not handled by Kubernetes itself.
Easy to Migrate: Just like Helm charts, operators can be easily installed, managed, and transported from one environment to another.
Kubernetes operators can be written in several ways, each catering to different levels of complexity and developer experience. The Go-based operator is the most powerful and flexible option, offering fine-grained control and deep integration with the Kubernetes API—making it the preferred choice for production-grade operators. The Ansible-based operator allows you to leverage existing Ansible playbooks, making it ideal for those with automation experience but limited programming background. The Helm-based operator uses Helm charts to manage application lifecycles, providing the fastest and simplest way to get started, especially for stateless or templated applications.
An admission controller is a process running in the Kubernetes cluster that intercepts the API requests sent to the API Server before they are persistent in the ETCD but after the request is authenticated and authorized.

Admission controller in Kubernetes
An admission control can be of type validating (only validates and accepts/denies the request to the API Server), mutating (makes changes to the object), or both.
In Kubernetes 1.32, the following admission plugins are enabled by default: CertificateApproval, CertificateSigning, CertificateSubjectRestriction, DefaultIngressClass, DefaultStorageClass, DefaultTolerationSeconds, LimitRanger, MutatingAdmissionWebhook, NamespaceLifecycle, PersistentVolumeClaimResize, PodSecurity, Priority, ResourceQuota, RuntimeClass, ServiceAccount, StorageObjectInUseProtection, TaintNodesByCondition, ValidatingAdmissionPolicy, ValidatingAdmissionWebhook.
Admission webhooks are HTTP callbacks that receive admission requests and perform actions on them. Webhooks can be easily called via an endpoint or a service reference.
Admission controllers are used to further restrict and manage what objects are submitted to the Kubernetes control plane. Any object submitted to the API Server goes through multiple admission controller phases.
Mutating admission controller
Validating admission controller
Mutating Admission Controller: Mutates object submissions before they are validated by the validating admission controller and before they exist in the cluster. It can modify the objects sent to the API Server to enforce certain rules. After the object modifications (mutations) are complete, validating webhooks are invoked for further validations.
For example, a mutating webhook could be called to ensure all pods should have a default label called managed-by: terraform and mutates any pods that do have the field defined as per the webhook’s config.
Validating Admission Controller: Defines a validation criteria that ensures that the object is valid in the cluster and will either accept or reject requests.
For example, a validating admission controller requires that all resources have explicit requests and limits set before submission to the API Server and rejects the request if that is not the case.

Admission webhooks in Kubernetes
Allowing pulling images only from specific registries
Label validation
Adding resource requests/limits
Sidecar injection
Replica count enforcement
In the above example, the webhook pod-policy.example.com, which is a type of validating webhook, will intercept all the API calls to the API Server that are being issued against the namespace example-namespace and are of type create pods.
For example: kubectl run nginx —-image=nginx
The webhook (which is a deployment exposed through a service) will receive the request and validate it according to the defined rules. The webhook will then return the response with Allowed or Denied along with the HTTP error code (if any).
name: Unique identifier for the webhook
scope: Defines if it applies to namespaced or cluster-wide resources
apiGroups: Which API groups to intercept ("" for core)
apiVersions: API versions to intercept
operations: What operations to intercept (CREATE, UPDATE, DELETE)
resources: Which resources to validate
service: Kubernetes service that will receive webhook requests
caBundle: CA certificate for TLS verification
admissionReviewVersions: Supported versions of the AdmissionReview API
sideEffects: Declares if the webhook has side effects
timeoutSeconds: Maximum time to wait for webhook response
A webhook failure can result in object creation failure or result in slow responses from kube-API Server operations like getting, listing, or patching Kubernetes objects. As the failed webhook will keep retrying the calls to the API Server, it will overload the API Server and could cause further issues to the control plane.
A failed webhook should be fixed or deleted to avoid any unforeseen issues.
No endpoint (pods) available to handle the request.
Service running in front of the deployment not exposed properly.
Service referred to in the webhook config does not exist.
The firewall rule allowing the traffic from the master to the pods on the service’s target port is not configured properly.
Custom Resource Definitions (CRDs): Templates that define the structure and validation rules for new objects
Custom Resources (CRs): Actual instances of resources that conform to CRD specifications
Custom Controllers: Components that manage the lifecycle of custom resources
The workflow involves the Kubernetes administrator creating a CRD, implementing a custom controller (preferably in Go), and then users can create CRs that get validated against the CRD specification. Popular examples include Prometheus, Crossplane, and ArgoCD controllers.
Operators are a way to package and manage custom controllers in Kubernetes.
Operators continuously monitor resource state and automatically restore desired state if changes occur on the live objects.
They can handle complex operational tasks beyond basic Kubernetes capabilities.
Operators can be developed using Go (preferred method), Ansible, or Helm.
Admission controllers are cluster-level gatekeepers that intercept API requests after authentication but before persistence in ETCD.
Types of admission controllers: mutating admission controllers and validating admission controllers.
Mutating admission controllers modify objects before validation.
Validating admission controllers enforce validation rules and accept or reject requests based on criteria.
Admission webhooks allow for dynamic admission control through HTTP callbacks. Common use cases include image registry restrictions, label validation, resource limit enforcement, sidecar injection, etc.
Create an Ingress resource that exposes a service on example.com/hello using service port 8080.
Install an ArgoCD application using the Helm chart by disabling the CRD installation.
Migrate an existing web application from Ingress to the Gateway API; you should maintain HTTPS access.
Managing role-based access control (RBAC)
Preparing underlying infrastructure for installing a Kubernetes cluster
Creating and managing Kubernetes clusters using Kubeadm
Managing a highly available Kubernetes cluster
Provisioning underlying infrastructure to deploying a Kubernetes cluster
Performing a version upgrade on a Kubernetes cluster using Kubeadm
Implementing ETCD backup and restore
Implementing and configuring a highly available control plane
Using Helm and Kustomize to install cluster components
Understanding extension interfaces (CNI, CSI, CRI, etc.)
Understanding CRDs and installing and configuring operators
Kubernetes cluster maintenance requires various operational tasks to keep the cluster up to date with the security fixes, hardware/software upgrades to get access to the latest features, keeping the cluster healthy to mitigate an ongoing issue, etc. It needs to be performed carefully to avoid any user/business impact. We will look into some of these critical tasks and how to execute them.
It adds a taint to the node that makes it unschedulable.
—-ignore-daemonset is important; otherwise, the DaemonSet controller will create new pods (the pods were controlled by DaemonSet) as soon as the existing pods are evicted.
Node Cordon: Marks a node as unschedulable to prevent new pods from being scheduled while allowing existing pods to continue running.
Drain Nodes: Safely evicts running pods from a node and marks it unschedulable for disruptive maintenance tasks, such as hardware upgrades or node deletion. Ensures workloads are safely migrated to other nodes before maintenance begins.
Uncordon: Resumes normal scheduling of new pods on a node after maintenance is complete.
In this chapter, we will learn the process of upgrading the Kubernetes cluster created with Kubeadm from version 1.31.x to 1.32.x.
Before understanding the upgrade process, let’s have a look at some fundamentals of Kubernetes versioning and the supported process.
A Kubernetes version, for example, 1.31.2, consists of a major release version, which is 1; 31 is the minor release that happens almost quarterly, and 2 is the patch set, which happens frequently for bug fixes and minor vulnerability fixes.
You can upgrade from one minor version to the next minor version, but you cannot perform the skip version upgrade as it is not supported. For example, you can upgrade from 1.29.x to 1.30.x, but you cannot do 1.29.x to 1.31.x; in this case, you first have to upgrade from 1.29.x to 1.30.x and then from 1.30.x to 1.31.x. In short, you can upgrade only one minor version at a time.

Kubernetes supported version upgrade process
Upgrade the master node.
Upgrade additional master nodes (if you are using multiple masters).
Upgrade the worker node.
When the master is down, management operations will be paused; however, your existing pods continue to run. For instance, you will not be able to run kubectl commands as your API Server is down; if your pod crashes, a new pod will not be provisioned as your controller manager is down, etc.
Depending upon the CNI plugin you are using, you can follow the instructions to upgrade the provider plugin: https://kubernetes.io/docs/concepts/cluster-administration/addons/.
This step is not needed for the exam unless explicitly told.
kubectl get nodes should show the upgraded version on the control plane node.
All the nodes (control plane and workers) should show the upgraded version 1.32.x-x.
This chapter outlined the process of upgrading a Kubernetes cluster created with Kubeadm from version 1.31.x to 1.32.x (only one minor version at a time). Skipping minor versions is not allowed/supported.
Upgrade the Master Node: Update Kubeadm, apply the upgrade, and ensure the control plane components are upgraded. Management operations will pause temporarily, but existing pods will continue running.
Upgrade Additional Master Nodes: If using multiple master nodes, repeat the process for each.
Upgrade Worker Nodes: Perform similar steps as for master nodes, ensuring all nodes are updated to the new version.
The process also includes draining and uncordoning nodes during upgrades to minimize disruption and restarting components like kubelet after updates.
Once complete, all nodes (control plane and workers) should reflect the upgraded Kubernetes version.
Perform cluster upgrade from one release to another, for example, 1.30.1 to 1.31.1; upgrade control plane as well as worker nodes.
Create an additional worker node and join to the master, then drain one of the existing nodes and migrate the workload to the newer node.
Evaluating cluster and node logging
Understanding how to monitor applications
Managing container stdout and stderr logs
Troubleshooting application failure
Troubleshooting cluster component failure
Troubleshooting networking
In Chapter 10, we have learned about the Metrics Server in Kubernetes. Now we will discuss more on how it works. A Metrics Server represents a critical component in the cluster’s monitoring architecture, serving as the foundation for resource metrics collection and exposure.
We have Kubelet running on each node, acting as the primary node agent that manages containers and maintains communication with the control plane. Working alongside Kubelet is cAdvisor, a specialized monitoring daemon that’s integrated directly into Kubelet. cAdvisor’s primary responsibility is to collect and aggregate real-time resource metrics from the container runtime and forward them to Kubelet. Kubelet also receives pod data from the node, such as CPU and memory details.
The Metrics Server transforms this raw data into a standardized format and exposes it through the Metrics API, making it accessible to various Kubernetes components.
The metrics also become available through kubectl commands like kubectl top node and kubectl top pod, providing administrators with quick insights into resource utilization for monitoring and features like HPA and VPA to work.
The main purpose of the Metrics Server is to fetch the resource metrics and node metrics, such as CPU and memory, from the kubelet and expose them in the Kubernetes API Server through the Metrics API to be used by HPA and VPA. The metrics-server calls the kubelet API to collect metrics from each node on the endpoint /metrics/resource.
Logs are generated by all the pods locally on the Kubernetes nodes as STDOUT (standard output) and STDERR (standard error). Kubernetes comes with limited monitoring and logging capabilities, and we generally use third-party monitoring, logging, and alerting solutions to extend its capabilities.
By default, these logs are not transferred to a third-party monitoring and logging solution such as Splunk or EFK/ELK; however, as a Kubernetes Admin/DevOps Engineer, we should be well versed with integrating an end-to-end monitoring and logging solution.
As we discussed in Chapter 2, Containerd is the default container runtime after Kubernetes 1.24, replacing Docker. With that introduction, we can no longer use Docker commands to debug applications and nodes; instead, we use a tool called crictl.
If you are running Kubernetes 1.24+, you should already have crictl installed on your Ubuntu machine, as it comes with the container runtime.
crictl works in a similar fashion to Docker commands with some exceptions. In the Docker runtime, you use the Docker ps command to check all the running containers; with crictl, you use crictl ps.
If you are using Kubernetes 1.24+, Docker commands will not work, so you can also use crictl commands to troubleshoot the issue. This is also helpful in case your API Server is down, meaning the kubectl commands will not work.
In the exam sandbox environment, crictl would also be installed; however, the Docker command wouldn’t work.
Let’s have a look on some of the important commands:
The Metrics Server plays an important role in Kubernetes by collecting and exposing resource metrics through the Metrics API.
It works with Kubelet, which gathers metrics from the container runtime via cAdvisor.
The data is collected and aggregated, enabling features like HPA and VPA while providing insights into resource utilization.
Logs in Kubernetes are generated locally on nodes and are limited to STDOUT and STDERR.
To extend logging and monitoring, third-party solutions like Splunk or ELK/EFK are commonly integrated. These solutions enhance Kubernetes’ native capabilities for comprehensive monitoring and alerting.
For clusters using Containerd as the default runtime (post-Kubernetes 1.24), debugging nodes requires the use of crictl instead of Docker commands.
crictl enables administrators to manage and troubleshoot containers even if the API Server is down, ensuring operational continuity.
In this chapter, we’ll explore common application deployment issues you might encounter during the CKA exam and in real-world scenarios. We’ll cover systematic approaches to understand the issue, common causes, diagnostic steps, and common mitigation steps using kubectl commands and best practices. This list is not exhaustive; however, it contains common application failure scenarios.
ImagePullErrors occur when Kubernetes cannot retrieve the container image from the specified registry. This is one of the most common issues you’ll encounter.
Incorrect image name or tag specified in the manifest file or the kubectl command
Private registry authentication issues, incorrect credentials, secrets, etc.
Registry availability problems (server-side issues)
Check the pod status:
Examine detailed pod information and the latest events:
Resolution Examples
Fix typos in the image name and make sure that the registry name, image name, and tag are valid and exist:
CrashLoopBackOff indicates that a pod repeatedly starts, crashes, and restarts.
Application errors
Invalid configuration
Resource constraints
Missing dependencies
Check the container startup probe:
Pods in a pending state haven’t been scheduled to any node; they could be waiting for something or facing scheduling issues.
Common Causes
Insufficient cluster resources
Node selector constraints
PVC binding issues
Taint/toleration mismatches
Issues with Kube Scheduler
Check pod events:
Logs most likely will not be available as the pod was not scheduled yet.
Based on your above findings, you can try fixing the issue.
Pods in a terminated state have completed their execution or were stopped.
Container process completed
OOMKilled (killed due to out-of-memory issues)
Pod eviction by controller or gracefully
Manual deletion
Review memory metrics:
Resolution Example
When services aren’t accessible, it’s often due to configuration issues.
Check endpoints and make sure they are listening to a valid IP/IP range:
Connection issues often involve NetworkPolicy configurations.
Verify network policies:
Test connectivity:
Service selector mismatches prevent proper pod-service binding.
Always start with kubectl get pods.
Use kubectl describe for detailed information.
Check logs with kubectl logs.
Review events with kubectl get events.
In this chapter, we will look at how to perform troubleshooting on your control plane from the CKA exam perspective.
Suppose you have been given a Kubernetes cluster managed by Kubeadm, and the cluster is in a broken state. The first command you will run is kubectl get nodes to check the node status.
If your nodes are in a ready state, that means your API Server and nodes are healthy. If you are getting a connection refused error, that means your kubectl is not able to connect to API Server, or API Server is down, or there are some issues with kubeconfig.
Check if the API Server is listed as one of the running containers; if it doesn’t exist, that means your API Server is down. You can verify API Server.yaml from the /etc/kubernetes/manifests/ directory.
Now you should see the exited container for API Server.
you should see an error by running the command: critical logs on kube-API Server container
You can check the API Server manifest and fix it if there are any mistakes; once you do that and if the file is correct, your API Server should be started. Most of the time, error related to API Server should be visible from the API Server container logs.
Now, if you run the kubectl get nodes, you should see the result from the kubectl command.
If you are still facing the issue, you can verify your kubeconfig file from ~/.kube/config.
If your kubeconfig is corrupted, you can copy the default kubeconfig file from /etc/kubernetes/admin.conf to ~/.kube/config.
The first thing you need to do is run <kubectl describe pod podname> to check the latest events. The events should show some details; if they don’t, it makes sense to check the kube-scheduler, as this is the component responsible for assigning pods to the node.
You can check the logs of the kube-scheduler pod and fix the kube-scheduler pod similar to what we did for the API Server. Once the kube-scheduler pod is healthy, your application pod should be scheduled on a node.
If you are running a deployment with multiple replicas and you delete one pod, or one of the pods crashed, the deployment controller should be able to spin up a new pod from the deployment template.
In the same way, if you are scaling your deployment and you update the replicas from two to four, two new pods should be created.
If it is not happening, the issue is likely with the kube controller manager, as this is the control plane component responsible for managing all the controllers, such as the deployment controller. You can check the controller manager logs and events. After fixing the issue, check if your pods are starting as part of the deployment now.
If you run kubectl get nodes and you see your nodes in a NotReady state, then the issue could be at the node level or at the CNI plugin level.
You can also check all the running pods and ensure that the pod specific to your CNI plugin is up and running. If you see any pods in a pending/error state, you can debug those using the steps we have followed earlier.
Go to the latest logs and look for the errors reported by Kubelet. If there are any issues with the Kubelet configuration, they should be reported in the logs.
In this chapter, we focused on identifying and resolving issues with Kubernetes control plane components from a Certified Kubernetes Administrator (CKA) exam perspective (most common issues).
API Server Troubleshooting: If the API Server is down, kubectl commands may fail with connection errors. This usually involves checking the kube-API Server container logs, the latest events, and configuration files for errors. Common issues include misconfigurations in the API Server manifest or API Server container itself.
Kubeconfig Troubleshooting: Problems with the kubeconfig file can lead to connection errors. Verifying and replacing corrupted kubeconfig files with default configurations often resolves these issues.
Kube-Scheduler Troubleshooting: If pods are not being assigned to nodes, the issue may lie with the kube-scheduler. Checking its logs and resolving configuration errors can restore proper scheduling functionality.
Kube-Controller Manager Troubleshooting: If deployments fail to scale or recover from pod failures, the kube-controller manager may be at fault. Reviewing its logs and addressing any issues ensures proper functioning of controllers like the deployment controller.
Verify the CNI Plugin: Check the plugin configuration files in the /etc/cni/net.d directory and ensure the CNI-related pods are running and debug any in pending or error states.
Inspect the Kubelet: Confirm the Kubelet service on the worker node is running. If the Kubelet is not running, start it and review its logs for errors. Address any configuration issues in the Kubelet’s config.yaml file.
When you interact with a kube-API Server by executing a command such as kubectl get nodes, by default it returns a JSON response with a huge amount of information and metadata. Kubectl intercepts that JSON payload and converts it into human-readable format.

Health status of the cluster
There are instances when we have to fetch additional details from the API Server; for that, we use JSONPATH, which queries the JSON payload of the kubectl command.
This command will return the label from the pod at the 0th index from the items. Items is a list that holds metadata from multiple pods returned by the command kubectl get pods, and we can specify the pod on which we want to query by specifying its index.
This will print the JSON equivalent of the query, and we can apply filters as per our requirements, like how we did with labels.
Let us look at a few more examples.
In this query, we have used [,] which signifies the union operator as we are combining multiple fields such as metadata.name and status.capacity, and we have used a wildcard [*] which will get all the objects.
You can further improve the readability by adding the header to the query in a specific format by using custom columns.
where DATA is the column name.
Sorting the Result
In the above command, we sorted the results based on the name of the pods.
JSONPath is a query language used in Kubernetes to extract specific information from the JSON payload returned by the kube-API Server.
By default, commands like kubectl get return a simplified, human-readable format. To access the underlying JSON structure, the -o=json flag can be used.
JSONPath allows users to query, filter, and format this JSON data effectively.
Querying specific fields within the JSON hierarchy using expressions like {.items[0].metadata.labels}
Using wildcards ([*]) to target multiple objects and union operators ([,]) to combine multiple fields
Iterating through lists with range loops to format output, enhancing readability with tabs (\t) and newlines (\n)
Escaping special characters and applying filters with conditions like ?() to retrieve data meeting specific criteria
Custom columns (-o=custom-columns) further improve readability by adding headers to output and allowing multiple, comma-separated fields.
Additionally, sorting results by specific fields can be achieved using the --sort-by flag.
SSH to your worker nodes and restart kubelet; check kubelet logs and its related configs.
Restart the API Server by moving the API Server manifest to a /tmp directory and restoring it back.
Create a pod and exec into it using crictl, and check its logs and status using crictl.
SSH into the worker nodes and ensure you are able to run kubectl commands; if you are getting the error, copy the kubeconfig from the master node and try again.
Write the kubectl command to return all running pods sorted by the creation timestamp.
Monitor the logs of a pod and look for error-not-found and redirect the message to a file.
The following strategies helped me ace the exam multiple times, so I thought I’d share them with you as well.
Tackle the easiest and quickest tasks first. Bookmark the rest for later.
Then, move on to time-consuming but straightforward tasks like cluster upgrades, node maintenance, role/rolebindings, etc.
Finally, attempt the complex and time-intensive tasks.
Utilize the preconfigured kubectl alias k.
Leverage bash auto-completion for faster command typing.
i: Enter insert mode
esc: Exit insert mode
:wq!: Save and quit the file
:q!: Quit without saving
shift +A: Enter insert mode at line end
:n: Go to nth line
shift +G: Go to end of line
d: Delete a character
dd: Delete entire line
Set Context: Always use the provided command to set the context before starting each task. It’s critical! Remember to set the context, else you will end up performing the task in a different cluster and lose your marks. You can copy the command given on top of each question and paste that in the terminal.
Copy Commands: Click on any command in the question to copy it easily. Click, copy, conquer!
Elevated Access: Use sudo -i (when instructed) for tasks requiring elevated privileges. Superuser mode activated!
Read Carefully: Please make sure you complete all tasks within each question. Read twice, do once!
Exiting SSH/sudo: After SSHing into a node or using sudo—i, use exit to return to the original user. Be cautious, as a terminal session might close otherwise. Exit safely!
Prioritize kubectl Commands: Use imperative kubectl commands whenever possible to save time.
Kubernetes Quick Reference Guide: The kubectl cheat sheet provides a quick reference for commands. You can also use kubectl command --help for specific command details.
https://kubernetes.io/docs/reference/kubectl/quick-reference/
Check the number of schedulable nodes excluding tainted (NoSchedule) and write the number to a file.
Scale the deployment to four replicas.
Create a network policy that allows access only from the nginx pod in the dev namespace to the redis pod in the test namespace.
Expose the deployment as a NodePort service on port 8080.
Monitor the logs of a pod and look for error-not-found and redirect the message to a file.
Check for the pods that have the label env=xyz and redirect the pod name with the highest CPU utilization to a file.
Create a multi-container pod with the images of Redis and Memcached.
Edit a pod and add an init container with a busybox image and a command such as sleep 50.
Given an unhealthy cluster with a worker node in a NotReady state, fix the cluster by SSHing into the worker node. Make sure the changes are permanent.
Create a cluster role, cluster rolebinding, and a service account cluster role that allow deployment, service, and DaemonSet to be created in a test namespace.
Make the node unschedulable and move the traffic to other healthy nodes.
Create a pod and schedule it on node worker01.
Create an Ingress resource task and set up path-based routing rules.
Create a PV with 1Gi capacity and mode as readWriteOnce and no StorageClass; create a PVC with 500Mi storage and mode as readWriteOnce; it should be bound with the PV. Create a pod that utilizes this PVC and use a mount path of /data.
Set up cri-dockerd as the container runtime.
Create a new HorizontalPodAutoscaler (HPA) named apache-server in the autoscale namespace. This HPA must target the existing deployment called apache-server in the autoscale namespace. Set the HPA to aim for 50% CPU usage per pod. Configure it to have at least 1 pod and no more than 4 pods. Also, set the downscale stabilization window to 30 seconds.
Install and set up a Container Network Interface (CNI) that supports network policy enforcement.
Rollback a deployment to a previous revision.