Kubernetes

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Kubernetes (aka. k8s) is an open-source system for automating the deployment, scaling, and management of containerized applications.

A k8s cluster consists of its control-plane components and node components (each representing one or more host machines running a container runtime and kubelet.service.

Installation

When manually creating a Kubernetes cluster install etcd^AUR and the package group kubernetes-control-plane (for a control-plane node) and kubernetes-node (for a worker node).

When creating a Kubernetes cluster with the help of kubeadm, install kubeadm and kubelet on each node.

Both control-plane and regular worker nodes require a container runtime for their kubelet instances which is used for hosting containers. Install either containerd or cri-o to meet this dependency.

To control a kubernetes cluster, install kubectl on the control-plane hosts and any external host that is supposed to be able to interact with the cluster.

Configuration

All nodes in a cluster (control-plane and worker) require a running instance of kubelet.service.

Tip: Read the following subsections closely before starting kubelet.service or using kubeadm.

Note: Disable swap on the host, as kubelet.service will otherwise fail to start.

Warning: Users with kubelet.service on a btrfs drive or subvolume running Kubernetes versions prior to 1.20.4 may be affected by a kubelet error, like:

Failed to start ContainerManager failed to get rootfs info: failed to get device for dir "/var/lib/kubelet": could not find device with major: 0, minor: 25 in cached partitions map

This affects both nested and un-nested setups. It is unrelated to CRI-O#Storage.

If you cannot upgrade to version 1.20.4+, a workaround for this bug is to create an explicit mountpoint (added to fstab) for e.g. either the entire /var/lib/ or just /var/lib/kubelet/ and /var/lib/containers/, like this:

 btrfs subvolume create /var/lib/kubelet
 btrfs subvolume create /var/lib/containers
 echo "/dev/vda2 /var/lib/kubelet btrfs subvol=/var/lib/kubelet 0 0" >>/etc/fstab
 echo "/dev/vda2 /var/lib/containers btrfs subvol=/var/lib/containers 0 0" >>/etc/fstab
 # mount -t btrfs -o subvol=/var/lib/kubelet /dev/vda2 /var/lib/kubelet/
 # mount -t btrfs -o subvol=/var/lib/containers /dev/vda2 /var/lib/containers/

Beware that kubeadm reset undoes this! For more information check out #95826, #65204,and #94335.

All provided systemd services accept CLI overrides in environment files:

kubelet.service: /etc/kubernetes/kubelet.env
kube-apiserver.service: /etc/kubernetes/kube-apiserver.env
kube-controller-manager.service: /etc/kubernetes/kube-controller-manager.env
kube-proxy.service: /etc/kubernetes/kube-proxy.env
kube-scheduler.service: /etc/kubernetes/kube-scheduler.env

This article or section needs expansion.

Reason:

Example for setup without kubeadm, using kube-apiserver.service, kube-controller-manager.service, kube-proxy.service and kube-scheduler.service.
Example for setup with kubeadm using configuration files.

(Discuss in Talk:Kubernetes)

Networking

The networking setup for the cluster has to be configured for the respective container runtime. This can be done using cni-plugins.

Pass the virtual network's CIDR to kubeadm init with e.g. --pod-network-cidr='10.85.0.0/16'.

Container runtime

The container runtime has to be configured and started, before kubelet.service can make use of it.

CRI-O

When using CRI-O as container runtime, it is required to provide kubeadm init or kubeadm join with its CRI endpoint: --cri-socket='unix:///run/crio/crio.sock'

Note: CRI-O by default uses systemd as its cgroup_manager (see /etc/crio/crio.conf). This is not compatible with kubelet's default (cgroupfs).

Change kubelet's default by appending --cgroup-driver='systemd' to the KUBELET_ARGS environment variable in /etc/kubernetes/kubelet.env upon first start (i.e. before using kubeadm init).

Note that the KUBELET_EXTRA_ARGS variable, used by older versions is now no longer read by /usr/lib/systemd/system/kubelet.service!

When kubeadm updates from 1.19.x to 1.20.x, then it should be possible to use https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#config-file as explained on https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-control-plane-node, as in https://github.com/cri-o/cri-o/pull/4440/files, instead of the above. (TBC, untested.)

After the node has been configured, the CLI flag could (but does not have to) be replaced by a configuration entry for kubelet:

/var/lib/kubelet/config.yaml

cgroupDriver: 'systemd'

Running

Before creating a new kubernetes cluster with kubeadm start and enable kubelet.service.

Note: kubelet.service will fail (but restart) until configuration for it is present.

Setup

This article or section needs expansion.

Reason:

Example for setup without kubeadm, using kube-apiserver.service, kube-controller-manager.service, kube-proxy.service and kube-scheduler.service.
Example for setup with kubeadm using configuration files.

(Discuss in Talk:Kubernetes)

When creating a new kubernetes cluster with kubeadm a control-plane has to be created before further worker nodes can join it.

Control-plane

Tip:

If the cluster is supposed to be turned into a high availability cluster (a stacked etcd topology) later on kubeadm init needs to be provided with --control-plane-endpoint=<IP or domain> (it is not possible to do this retroactively!).
It is possible to use a config file for kubeadm init instead of a set of parameters.

Use kubeadm init to initialize a control-plane on a host machine:

# kubeadm init --node-name=<name_of_the_node> --pod-network-cidr=<CIDR> --cri-socket=<SOCKET>

Note: Refer to #Networking and #Container runtime for <CIDR> and <SOCKET> (respectively).

If run successfully, kubeadm init will have generated configurations for the kubelet and various control-plane components below /etc/kubernetes and /var/lib/kubelet/. Finally, it will output commands ready to be copied and pasted to setup kubectl and make a worker node join the cluster (based on a token, valid for 24 hours).

To use kubectl with the freshly created control-plane node, setup the configuration (either as root or as a normal user):

$ mkdir -p $HOME/.kube
# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
# chown $(id -u):$(id -g) $HOME/.kube/config

To install a pod network such as calico, follow the upstream documentation.

Worker node

With the token information generated in #Control-plane it is possible to make a node machine join an existing cluster:

 # kubeadm join <control-plane-host>:<control-plane-port> --token <token> --discovery-token-ca-cert-hash sha256:<hash> --node-name=<name_of_the_node> --cri-socket=<SOCKET>

Note: Refer to #Container runtime for <SOCKET>.

Tips and tricks

Tear down a cluster

When it is necessary to start from scratch, use kubectl to tear down a cluster.

 kubectl drain <node name> --delete-local-data --force --ignore-daemonsets

Here <node name> is the name of the node that should be drained and reset. Use kubectl get node -A to list all nodes.

Then reset the node:

# kubeadm reset

Operating from Behind a Proxy

kubeadm reads the https_proxy, http_proxy, and no_proxy environment variables. Kubernetes internal networking should be included in the latest one, for example

export no_proxy="192.168.122.0/24,10.96.0.0/12,192.168.123.0/24"

where the second one is the default service network CIDR.

Troubleshooting

Failed to get container stats

If kubelet.service emits

 Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"

it is necessary to add configuration for the kubelet (see relevant upstream ticket).

/var/lib/kubelet/config.yaml

systemCgroups: '/systemd/system.slice'
kubeletCgroups: '/systemd/system.slice'

Pods cannot communicate when using Flannel CNI and systemd-networkd

See upstream bug report.

systemd-networkd assigns a persistent MAC address to every link. This policy is defined in its shipped configuration file /usr/lib/systemd/network/99-default.link. However, Flannel relies on being able to pick its own MAC address. To override systemd-networkd's behaviour for flannel* interfaces, create the following configuration file:

/etc/systemd/network/50-flannel.link

[Match]
OriginalName=flannel*

[Link]
MACAddressPolicy=none

Then restart systemd-networkd.service.

If the cluster is already running, you might need to manually delete the flannel.1 interface and the kube-flannel-ds-* pod on each node, including the master. The pods will be recreated immediately and they themselves will recreate the flannel.1 interfaces.

Delete the interface flannel.1:

# ip link delete flannel.1

Delete the kube-flannel-ds-* pod. Use the following command to delete all kube-flannel-ds-* pods on all nodes:

$ kubectl -n kube-system delete pod -l="app=flannel"