Questions About Rolling Upgrades of Rancher and Availability

Seth_Duke · August 7, 2025, 3:50pm

I manage several Rancher instances at my organization, and I have noticed many times in the past, during random errors or even upgrades, that the entire Rancher UI becomes unavailable and no longer reachable. While I can understand this is some scenarios, what baffles me is why when I have 3 replicas of rancher running in my local cluster, and one goes down, I start getting really flaky availability, almost like things are still getting routed to the bad pod out of the three. I almost always see this same issue during rolling upgrades to the Rancher system as well.

Has anyone else experience this? I have looked at nearly everything I can think of from Nginx Ingress configurations to the Rancher deployments readiness checks, and its seems baffling to me that the system doesn’t seem to function when a replica is lost.

Topic		Replies	Views
After rke upgrade to HA rancher cluster, rancher intermmitantly reports "cluster is currently unavailable" SUSE Rancher Prime	1	783	February 4, 2020
Random cluster "unavailable" downtimes SUSE Rancher Prime	3	1424	September 27, 2018
Upgrading Rancher without clusters going off-line SUSE Rancher Prime	1	664	January 24, 2019
Need for Rancher HA / Impact if down? SUSE Rancher Prime	13	5342	June 14, 2018
Downstream Cluster Unavailable SUSE Rancher Prime	0	223	September 18, 2023

Questions About Rolling Upgrades of Rancher and Availability

Related topics