Hello dear all!
In short problem is - waiting for kubelet to update.
Installation type - docker, single master, few workers. (Yes, this is a crappy method, but it works quite stably and does not fail surprisingly, and most importantly, it’s as easy as 3 rubles.)
Ubuntu 22.04.5.
Certificates are ok.
Internet - full access.
Rke2, onprem.
Age - 2 years. Initial as far as i remember - 2.7.1 > 2.7.7 > updated to 2.8.0 with kuber successfull.
Now updated to rancher 2.9.3 v1.27.16+rke2r2 conditionally successful. (this rodeo lasted 2 days. facepalm. also not without adventures.)
2.10.2 doesn’t work either, I don’t remember the error, I haven’t bothered to figure it out yet. Now we will talk about 2.9.3.
One day I wanted to find myself an adventure in one place with an attempt to update the version of kubernetis from v1.27.16+rke2r2 to v1.28.15+rke2r1 for example, and what do you think that everything worked out in one go?
NOPE!!!
It’s just stuck on one action - [INFO] [planner] rkecluster fleet-default/clustename: configuring bootstrap node(s) custom-a2e9e76ac87f: waiting for kubelet to update.
a2e9e76ac87f - first node - master in my case.
restart agent not helping.
restart server not helping.
reboot not helping.
deleting some pods with help of crictl out of not knowing what else to delete (kube-controller-manager- cloud-controller-manager-) - not helping, even more this action is a little bit fatal.
in - /var/lib/rancher/rke2/agent/containerd/containerd.log - i dont see any critical, looks loke he working, downloading something, doing something and then just hangs.
in - /var/lib/rancher/rke2/agent/logs/kubelet.log - i dont see any critical, looks loke he working, downloading something, doing something and then just hangs.
in - /var/log/syslog - i dont see any critical, looks loke he working, downloading something, doing something and then just hangs.
in - /docker console logs - i dont see any critical, looks loke he working, downloading something, doing something and then just hangs on - waiting for kubelet to update.
Although it is possible that the error appears in the log as warning or info. but the logs are not small, I have no idea what to look for there. The footcloth there is unrealistically huge.
It’s like some job is stuck.
Maybe someone has already gone through this, not necessarily with the specified versions, I tried it on different versions, the same thing.
Maybe someone can tell me & not only me what can be done or deleted or restarted or edited?