Rancher 2.9.3 update from v1.27.16+rke2r2 to v1.28.15+rke2r1 or v1.30.9+rke2r1 quest

Hello dear all!

In short problem is - waiting for kubelet to update.

Installation type - docker, single master, few workers. (Yes, this is a crappy method, but it works quite stably and does not fail surprisingly, and most importantly, it’s as easy as 3 rubles.)
Ubuntu 22.04.5.
Certificates are ok.
Internet - full access.
Rke2, onprem.
Age - 2 years. Initial as far as i remember - 2.7.1 > 2.7.7 > updated to 2.8.0 with kuber successfull.
Now updated to rancher 2.9.3 v1.27.16+rke2r2 conditionally successful. (this rodeo lasted 2 days. facepalm. also not without adventures.)
2.10.2 doesn’t work either, I don’t remember the error, I haven’t bothered to figure it out yet. Now we will talk about 2.9.3.

One day I wanted to find myself an adventure in one place with an attempt to update the version of kubernetis from v1.27.16+rke2r2 to v1.28.15+rke2r1 for example, and what do you think that everything worked out in one go?
NOPE!!!
It’s just stuck on one action - [INFO] [planner] rkecluster fleet-default/clustename: configuring bootstrap node(s) custom-a2e9e76ac87f: waiting for kubelet to update.
a2e9e76ac87f - first node - master in my case.
restart agent not helping.
restart server not helping.
reboot not helping.
deleting some pods with help of crictl out of not knowing what else to delete (kube-controller-manager- cloud-controller-manager-) - not helping, even more this action is a little bit fatal.

in - /var/lib/rancher/rke2/agent/containerd/containerd.log - i dont see any critical, looks loke he working, downloading something, doing something and then just hangs.
in - /var/lib/rancher/rke2/agent/logs/kubelet.log - i dont see any critical, looks loke he working, downloading something, doing something and then just hangs.
in - /var/log/syslog - i dont see any critical, looks loke he working, downloading something, doing something and then just hangs.

in - /docker console logs - i dont see any critical, looks loke he working, downloading something, doing something and then just hangs on - waiting for kubelet to update.

Although it is possible that the error appears in the log as warning or info. but the logs are not small, I have no idea what to look for there. The footcloth there is unrealistically huge.

It’s like some job is stuck.

Maybe someone has already gone through this, not necessarily with the specified versions, I tried it on different versions, the same thing.
Maybe someone can tell me & not only me what can be done or deleted or restarted or edited?

Updated to 2.10.2, but kubernetis version is steel not updating.

Hangs on this on first node:

[INFO] [planner] rkecluster fleet-default/clustername: configuring bootstrap node(s) custom-a2e9e76ac87f: waiting for probes: calico

[INFO] [planner] rkecluster fleet-default/clustername: configuring bootstrap node(s) custom-a2e9e76ac87f: waiting for kubelet to update
srp1

ok, I’ll answer my own quest, it hasn’t even been a year, in general you need to update the cuber version to the limit on version 2.9.3, then switch to 2.10.3, update the cuber on it, then switch to 2.11.3, update the cuber on it, then switch to 2.12.0 and update the cuber. This is how it turned out for me.

let’s say you have the unupdatable version now, look 1 post.
roll back to the version of rancher that will work successfully, in my case from 2.10.2 roll back to 2.9.3
now you have somthing like 2.9.3 & v1-27-16-rke2r2 & 1.7.22-k3s1 & 5.15.0-152-generic
click update kuber here from v1-27-16-rke2r2 to v1.28.15+rke2r1
the master has been updated to v1.28.15+rke2r1, everything else has gone to hell, look 1 post. (waiting for kublet to update, Provisioning, reconsiling & fail-whale & HTTP Error 500 & blah blah blah, just wait here)

it’s like a solution:

docker exec -it rancher_container_2.9.3 sh

1) Правим ТОЛЬКО nodeInfo у Machine (subresource=status)

kubectl -n fleet-default patch machine custom-a2e9e76ac87f
–type=merge --subresource=status -p ‘{
“status”:{
“nodeInfo”:{
“kubeletVersion”:“v1.28.15+rke2r1”,
“kubeProxyVersion”:“v1.28.15+rke2r1”,
“containerRuntimeVersion”:“containerd://1.7.22-k3s1”,
“kernelVersion”:“5.15.0-152-generic”
}
}
}’

2) Массовый патч всех Machine под кластер youclaternamehere (без jq)

NS=fleet-default
CLUSTER=youclaternamehere
KVER=“v1.28.15+rke2r1”
KPROXY=“v1.28.15+rke2r1”

Если хочешь жёстко проставить containerd — укажи тут; иначе возьмём текущее из Machine (если есть)

DEFAULT_CR=“containerd://1.7.22-k3s1”

for MACH in $(kubectl -n “$NS” get machine
-l “cluster.x-k8s.io/cluster-name=$CLUSTER”
-o jsonpath=‘{range .items[*]}{.metadata.name}{" "}{end}’); do

возьмём текущие значения, чтобы не затирать пустотой

CUR_CR=$(kubectl -n “$NS” get machine “$MACH” -o jsonpath=‘{.status.nodeInfo.containerRuntimeVersion}’)
CUR_KN=$(kubectl -n “$NS” get machine “$MACH” -o jsonpath=‘{.status.nodeInfo.kernelVersion}’)

если нет значения в Machine — подставим дефолты

[ -z “$CUR_CR” ] && CR=“$DEFAULT_CR” || CR=“$CUR_CR”

kernel лучше оставить как есть; если его нет — просто не будем трогать поле

echo “Patching Machine/$MACH → kubelet=$KVER kubeProxy=$KPROXY CR=$CR kernel=${CUR_KN:-‘(keep as-is/omit)’}”

if [ -n “$CUR_KN” ]; then
kubectl -n “$NS” patch machine “$MACH” --type=merge --subresource=status
-p “{“status”:{“nodeInfo”:{“kubeletVersion”:“$KVER”,“kubeProxyVersion”:“$KPROXY”,“containerRuntimeVersion”:“$CR”,“kernelVersion”:“$CUR_KN”}}}”
else
kubectl -n “$NS” patch machine “$MACH” --type=merge --subresource=status
-p “{“status”:{“nodeInfo”:{“kubeletVersion”:“$KVER”,“kubeProxyVersion”:“$KPROXY”,“containerRuntimeVersion”:“$CR”}}}”
fi
done

Substitute your values, and it is possible that you too will be able to fix everything.
Then when updating to 1.29 & next everything goes as normal.
So far, I have managed to remove the cluster from the update\upgrade doom.

After updating rancher to 2.10.3 - repeat from sratch. And so on. Dont forget to substitute versions for every update & rancher version like a:
“kubeletVersion”:“v1.31.11+rke2r1”,
“kubeProxyVersion”:“v1.31.11+rke2r1”,
“containerRuntimeVersion”:“containerd://2.0.5-k3s2”,
and so on.

2.11.3 for some reason does not start, so immediately on 2.12.1 and v1.33.3+rke2r1.

Age of cluster - almost 1000 days. Initial as far as i remember - 2.7.1 > 2.7.7 > updated to 2.8.0 with kuber successfull. + > 2.9.3 > 2.10.3 > [2.11.3 not started] > 2.12.1 successful.