"RTFM" also means "Follow the Fine Manual"
Now that I have my VMs, it's time to provision Kubernetes. I'll work through chapters 2-8 of Kubernetes The Hard Way, setting up etcd and the Kubernetes control plane.
I enable root ssh login and distribute ssh keys to the other VMs. While I'm at it I enable sudo
for the non-root user I created at install time. Having to always su to root gives me that uncomfortable, driving-without-a-seatbelt feeling.
I know some distros are moving away from sudo, so I check the protocol for Debian and find I need to add this user to the group sudo, rather than the traditional wheel.
Next it's time to set up the hosts file on the VMs- but do I really need to do this?
I had set the VMs' hostnames at install time, and dnsmasq, the DNS/DHCP on my home network, lets clients auto-register their hostname and IP address, so I can resolve these names even when they're not in /etc/hosts.
Looks like I can skip this step.
Narrator's voice: he could not, in fact, skip this step.
I breeze through the setup instructions in chapters 4-6, then stumble briefly in chapter 7, when I'm told to:
Set the etcd name to match the hostname of the current compute instance
I decide to ignore this, and start etcd, which comes up without any errors.
root@server:~# etcdctl member list
6702b0a34e2cfd39, started, controller, http://127.0.0.1:2380, http://127.0.0.1:2379, false
Now on to the control plane. Chapter 8 is lengthy, but quick- it's almost all mv commands. I start the Kubernetes services, and check their status:
root@server:~# systemctl|grep kube
kube-apiserver.service loaded active running Kubernetes API Server
kube-controller-manager.service loaded active running Kubernetes Controller Manager
kube-scheduler.service loaded activating auto-restart Kubernetes Scheduler
That kube-scheduler status looks odd. journalctl --since "5 minutes ago" tells me:
err="stat /var/lib/kubernetes/kube-scheduler.kubeconfig: no such file or dir
Looks like I missed a step. I move the missing .kubeconfig file into /var/lib/kubernetes and bounce the Kubernetes services. Things look much better now:
root@server:~# systemctl|grep kube
kube-apiserver.service loaded active running Kubernetes API Server
kube-controller-manager.service loaded active running Kubernetes Controller Manager
kube-scheduler.service loaded active running Kubernetes Scheduler
I make a final check with journalctl:
...Get "https://server.larock.nu:6443/api/v1/persistentvolumes?limit=500&resourceVersion=0":
tls: failed to verify certificate: x509: certificate is valid for kubernetes, kubernetes.default,
kubernetes.default.svc, kubernetes.default.svc.cluster,kubernetes.svc.cluster.local,
server.kubernetes.local, api-server.kubernetes.local, not server.larock.nu
That's not good.
The self-signed cert doesn't include my domain, and there might also be an issue with my domain not ending in .local (there are nuances about how the signing certificate chain is resolved that I'm not expert on).
I shut down the Kubernetes services and etcd, and clear any data associated with them:
rm -r /etc/kubernetes/config /var/lib/kubernetes
rm -r /etc/etcd /var/lib/etcd/*
Then back to chapter 3, and I do the hosts file work I had skipped before, and repeat the configuration steps.
I start everything up on server, don't see any errors in journalctl, and probe Kubernetes:
root@server:~# kubectl cluster-info --kubeconfig admin.kubeconfig
Kubernetes control plane is running at https://127.0.0.1:6443
Then a test from the jumpbox:
root@jumpbox:~/kubernetes-the-hard-way# curl --cacert ca.crt \
https://server.kubernetes.local:6443/version
{
"major": "1",
"minor": "32",
"gitVersion": "v1.32.3",
"gitCommit": "32cc146f75aad04beaaa245a7157eb35063a9f99",
"gitTreeState": "clean",
"buildDate": "2025-03-11T19:52:21Z",
"goVersion": "go1.23.6",
"compiler": "gc",
"platform": "linux/amd64"
}
I have a working control plane.
Next: provisioning the workers, and final steps.