CKA Study notes - Monitoring applications

Continuing with my Certified Kubernetes Administrator exam preparations I'm now going to take a look at the Troubleshooting objective. I've split this into three posts, Application Monitoring (this post), Logging and Troubleshooting

The troubleshooting objective counts for 30% so based on weight it's the most important objective in the exam so be sure to spend some time studying it. The Kubernetes Documentation is as always the place to go, as this is available during the exam. Troubleshooting is often very situation specific, and oftentimes we need to combine multiple troubleshooting techniques. These posts will be fairly generic and again based on studying for the CKA exam

One of the sub-objectives of Troubleshooting is Understand how to monitor applications, this post will touch upon a few pointers in that respect.

Note #1: I'm using documentation for version 1.19 in my references below as this is the version used in the current (jan 2021) CKA exam. Please check the version applicable to your usecase and/or environment

Note #2: This is a post covering my study notes preparing for the CKA exam and reflects my understanding of the topic, and what I have focused on during my preparations.

Metrics

Kubernetes Documentation reference

Let's first take a look at Metrics

To be able to pull metrics from a Kubernetes cluster we need to install the metrics server. The Metrics server exposes Kubernetes system metrics through a /metrics endpoint formatted in the Prometheus format.

Example of components that exposes metrics:

  • kube-controller-manager
  • kube-proxy
  • kube-apiserver
  • kube-scheduler
  • kubelet

The metrics server can be installed from https://github.com/kubernetes-sigs/metrics-server

Metrics server is also used for horizontal autoscaling based on CPU/Memory usage.

I'm not sure if metrics-server will be used in the CKA exam, but if it does I think (or at least hope) it will be pre-installed and ready to use

Monitoring applications

Application monitoring can be done with Liveness probes, Readiness probes and Startup probes

A liveness probe is what the kubelet uses to determine when to restart a container. Liveness probes effectively report if the container is alive or not. The definition of alive can is obviously different based on the app, but the app can for example report that it's not alive anymore if it has encountered some kind of deadlock.

A readiness probe is used by the kubelet to know when the container is ready to accept traffic. If a Pod consists of multiple containers, all containers must be ready before the Pod reports that it's ready. Readiness probes can also be used for removing pods from a Service.

A startup probe is used for slow starting containers, avoiding the kubelet to kick of the liveness or readiness checks too soon.

More information on container probes can be found in the documentation

Liveness probes

Kubernetes Documentation reference

The liveness probe can either be set up to run a command, an HTTP GET request or a TCP check.

When configuring a liveness probe we set what kind of probe to use and the corresponding command or request, as well as the parameters initialDelaySeconds which specifies how long the probe should wait before starting the checks, and periodSeconds which specifies how often the check is run.

The commands and requests run by the probe expects a successful result which the application developer needs to configure. For example with a HTTP endpoint returning a success response. For HTTP requests everything between a response code of 200 and 400 is considered a success.

An example of a liveness probe executing an HTTP request:

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: pod-name
 5spec:
 6  containers:
 7    - image: my-image
 8      name: my-name
 9      livenessProbe:
10        httpGet:
11          path: /health #Or some other path
12          port: 80

If an HTTP GET request doesn't return a response larger or equal to 200 and less than 400 the probe fails

Startup probes

Kubernetes Documentation reference

A startup probe is often used together with a liveness probe for applications that have longer startup times. These could be difficult to monitor effectively with a liveness probe as the periodSeconds would have to be set too high.

A startup probe fixes this with a failureThreshold together with the periodSeconds parameter. The failurethreshold is multiplied with the periodSeconds to cover the startup time allowed before a failure should be reported

Readiness probe

Kubernetes Documentation reference

A readiness probe are configured similar to a liveness probe. The difference between them is that a liveness probe will restart the container to try to fix the issue, whereas a readiness probe will stop sending the Pod any traffic by removing it from the service. This is useful if you want to troubleshoot the application.

Summary

Monitoring applications is obviously very specific to the individual application and the whole of the infrastructure it runs in. The techniques covered above shows how Kubernetes easily can determine if there is an issue with an application.

This page was modified on January 14, 2021: Changed draft status