I couldn’t recover fully the rancherOS node that had the rancher/rancher instance.
What I ended up doing was a full backup of the data volume, booted a new machine this time using ubuntu and not rancherOS and copied the data volume there.
Installed docker fresh, and with the following command I started a new container to replace the old machine:
sudo docker run -d --restart=unless-stopped -p 80:80 -p 443:443 -v /opt/rancher:/var/lib/rancher --privileged rancher/rancher:latest --acme-domain
(Keep in mind IP’s, I don’t know the impact of changing IP’s, I booted a new machine using the same IP as the old one and stopped the old one to prevent IP collisions)
With this I now had a working container, but still wasn’t showing up on the browser so I then followed this from within the rancher/rancher container: Expired K3s certificates are not automatically rotated causing connection issues
Which things got a bit better but still I was getting errors on the console and wasn’t being able to connect to the other clusters.
Finally I found this: Rancher Docs: Rotation of Expired Webhook Certificates
Which running also from within the rancher/rancher container solved the issue.
!! For some reason this last step removed my local cluster from the UI !!
Besides the removal of the local cluster from the UI now everything seems to be working nicely, in my case the local cluster didn’t have anything anyway so it was fine by me.
Hope this helps others.
Thanks