CKA Study notes - Monitoring applications
Continuing with my Certified Kubernetes Administrator exam preparations I'm now going to take a look at the Troubleshooting objective. I've split this into three posts, Application Monitoring (this post), Logging and Troubleshooting
The troubleshooting objective counts for 30% so based on weight it's the most important objective in the exam so be sure to spend some time studying it. The Kubernetes Documentation is as always the place to go, as this is available during the exam. Troubleshooting is often very situation specific, and oftentimes we need to combine multiple troubleshooting techniques. These posts will be fairly generic and again based on studying for the CKA exam
One of the sub-objectives of Troubleshooting is Understand how to monitor applications, this post will touch upon a few pointers in that respect.
Note #1: I'm using documentation for version 1.19 in my references below as this is the version used in the current (jan 2021) CKA exam. Please check the version applicable to your usecase and/or environment
Note #2: This is a post covering my study notes preparing for the CKA exam and reflects my understanding of the topic, and what I have focused on during my preparations.
Metrics
Kubernetes Documentation reference
Let's first take a look at Metrics
To be able to pull metrics from a Kubernetes cluster we need to install the metrics server. The Metrics server exposes Kubernetes system metrics through a /metrics endpoint formatted in the Prometheus format.
Example of components that exposes metrics:
- kube-controller-manager
- kube-proxy
- kube-apiserver
- kube-scheduler
- kubelet
The metrics server can be installed from https://github.com/kubernetes-sigs/metrics-server
Metrics server is also used for horizontal autoscaling based on CPU/Memory usage.
I'm not sure if metrics-server will be used in the CKA exam, but if it does I think (or at least hope) it will be pre-installed and ready to use
Monitoring applications
Application monitoring can be done with Liveness probes
, Readiness probes
and Startup probes
A liveness probe
is what the kubelet
uses to determine when to restart a container. Liveness probes effectively report if the container is alive or not. The definition of alive can is obviously different based on the app, but the app can for example report that it's not alive anymore if it has encountered some kind of deadlock.
A readiness probe
is used by the kubelet
to know when the container is ready to accept traffic. If a Pod consists of multiple containers, all containers must be ready before the Pod reports that it's ready. Readiness probes can also be used for removing pods from a Service
.
A startup probe
is used for slow starting containers, avoiding the kubelet
to kick of the liveness
or readiness
checks too soon.
More information on container probes can be found in the documentation
Liveness probes
Kubernetes Documentation reference
The liveness probe can either be set up to run a command, an HTTP GET request or a TCP check.
When configuring a liveness probe we set what kind of probe to use and the corresponding command or request, as well as the parameters initialDelaySeconds
which specifies how long the probe should wait before starting the checks, and periodSeconds
which specifies how often the check is run.
The commands and requests run by the probe expects a successful result which the application developer needs to configure. For example with a HTTP endpoint returning a success response. For HTTP requests everything between a response code of 200 and 400 is considered a success.
An example of a liveness probe executing an HTTP request:
1apiVersion: v1
2kind: Pod
3metadata:
4 name: pod-name
5spec:
6 containers:
7 - image: my-image
8 name: my-name
9 livenessProbe:
10 httpGet:
11 path: /health #Or some other path
12 port: 80
If an HTTP GET request doesn't return a response larger or equal to 200 and less than 400 the probe fails
Startup probes
Kubernetes Documentation reference
A startup probe is often used together with a liveness probe for applications that have longer startup times. These could be difficult to monitor effectively with a liveness probe as the periodSeconds
would have to be set too high.
A startup probe fixes this with a failureThreshold
together with the periodSeconds
parameter. The failurethreshold is multiplied with the periodSeconds to cover the startup time allowed before a failure should be reported
Readiness probe
Kubernetes Documentation reference
A readiness probe are configured similar to a liveness probe. The difference between them is that a liveness probe will restart the container to try to fix the issue, whereas a readiness probe will stop sending the Pod any traffic by removing it from the service. This is useful if you want to troubleshoot the application.
Summary
Monitoring applications is obviously very specific to the individual application and the whole of the infrastructure it runs in. The techniques covered above shows how Kubernetes easily can determine if there is an issue with an application.