Antisocial worker nodes, and maybe 2gb of memory is not enough
I'm almost done- once I spin up my two worker nodes, I'll have my own private Kubernetes cluster. I breeze through the first few steps in chapter 9, until I get to the part about disabling swap.
This should be easy: identify the swap partition or file, comment it out of /etc/fstab, and reboot. But swap is no longer defined there:
root@node-0:~# swapon --show NAME TYPE SIZE USED PRIO /dev/sda5 partition 975M 524K -2 root@node-0:~# grep sda5 /etc/fstab # swap was on /dev/sda5 during installation
I'm all in favor of systemd: over the years I've been bitten too many times by the shortcomings of the classic SysV init/rc system. Even so, there are times when it's all-encompassing nature still irks me.
Turning off swap the systemd way:
root@node-0:~# systemctl --type swap UNIT LOAD ACTIVE SUB dev-disk-by\x2duuid-8ad8b7b3\x2d87ae\x2d423a\x2daa1c\x2d67b82c1a8812.swap loaded active active
Are those backslashes escapes or literal backslashes? One way to find out:
root@node-0:~# systemctl mask dev-disk-by\x2duuid-8ad8b7b3\x2d87ae\x2d423a\x2daa1c\x2d67b82c1a8812.swap Unit dev-disk-byx2duuid-8ad8b7b3x2d87aex2d423ax2daa1cx2d67b82c1a8812.swap does not exist, proceeding anyway. Created symlink /etc/systemd/system/dev-disk-byx2duuid-8ad8b7b3x2d87aex2d423ax2daa1cx2d67b82c1a8812.swap → /dev/null. root@node-0:~# systemctl mask 'dev-disk-by\x2duuid-8ad8b7b3\x2d87ae\x2d423a\x2daa1c\x2d67b82c1a8812.swap' Created symlink /etc/systemd/system/dev-disk-by\x2duuid-8ad8b7b3\x2d87ae\x2d423a\x2daa1c\x2d67b82c1a8812.swap → /dev/null.
Then turn off any swap still in use:
root@node-0:~# swapoff -a root@node-0:~# systemctl --type swap UNIT LOAD ACTIVE SUB DESCRIPTION 0 loaded units listed. Pass --all to see loaded but inactive units, too. To show all installed unit files use 'systemctl list-unit-files'.
Reboot, and check our work:
root@node-0:~# systemctl --type swap --all UNIT LOAD ACTIVE SUB dev-disk-by\x2duuid-56340219\x2d2c91\x2d4837\x2db4d2\x2d82e9823cbea9.swap loaded inactive dead
I finish up the rest of the node-0's configuration, start containerd and the Kubernetes services, and verify everything's running:
root@node-0:~# systemctl|grep kube kube-proxy.service loaded active running Kubernetes Kube Proxy kubelet.service loaded active running Kubernetes Kubelet kubepods-besteffort.slice loaded active active libcontainer container kubepods-besteffort.slice kubepods-burstable.slice loaded active active libcontainer container kubepods-burstable.slice kubepods.slice loaded active active libcontainer container kubepods.slice
I repeat these steps on node-1, then do the final verification from jumpbox:
root@jumpbox:~/kubernetes-the-hard-way# ssh root@server \ "kubectl get nodes \ --kubeconfig admin.kubeconfig"
Or try to. It takes a very long time just to get a shell prompt, and then the ssh command hangs. Ok, I'll do this from server.
I'm finally able to get the worker node status, though it takes a minute or so for the command to complete:
root@server:~# kubectl get nodes --kubeconfig admin.kubeconfig NAME STATUS ROLES AGE VERSION node-0 NotReady7m31s v1.32.3 node-1 NotReady 9m27s v1.32.3
Not good, and the painful sluggishness of my VMs is making this hard to debug. I pause the jumpbox and node-1 VMs and start poking around node-0.
journalctl shows me (lightly edited):
E0522 kubelet.go:3002] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" E0522 20:04:54.979171 3938 kubelet.go:3002] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" W0522 20:04:57.927081 3938 transport.go:356] Unable to cancel request for *otelhttp.Transport E0522 20:04:57.927232 3938 controller.go:195] "Failed to update lease" err="Put \"https://server.kubernetes.local:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/node-0?timeout=10s\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
Is there a bridge network adapter?
root@node-0:~# ip address 1: lo:mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host noprefixroute valid_lft forever preferred_lft forever 2: ens18: mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether bc:24:11:af:19:5a brd ff:ff:ff:ff:ff:ff altname enp0s18 inet 10.1.1.226/24 brd 10.1.1.255 scope global dynamic ens18 valid_lft 151706sec preferred_lft 151706sec inet6 fe80::be24:11ff:feaf:195a/64 scope link valid_lft forever preferred_lft forever
Doesn't look like it, but I haven't worked much with bridges under Linux- maybe ip doesn't show them?
Did the bridge driver not get properly configured? It appears to have been loaded:
root@node-0:~# lsmod|grep bridge bridge 311296 1 br_netfilter stp 16384 1 bridge llc 16384 2 bridge,stp
I search on the error messages from journald. I discover a lot of cool stuff about CNI plugins and pod networking, but very little relevant to my problem.
It's almost Memorial Day weekend in the US, so I'm ready to take a break. My plan for the weekend is:
- See if I can find any compatible memory priced cheaply. It's painful to work on the machine right now, and there's a very slight chance that my problems are due to memory exhaustion and timeouts.
- Verify that the Dell's Xeon processor supports the "virtualization in a VM" VT-x extensions required by containerd. This is the worst-case scenario- that my machine is just too old to run this.
- Learn more about CNI, and how to debug it.
I'm going to kick back for the weekend, first.