Hello,
I need help with an imported downstream RKE2 cluster that does not stay connected in Rancher.
Setup:
- Rancher server version: v2.14.0
- Rancher URL: https://mycompany-rancher
- Rancher ingress class: traefik
- Rancher ingress version: rancher/mirrored-library-traefik:3.6.10
- Downstream cluster: RKE2, 3 nodes
- Downstream RKE2 version: rke2 version v1.34.6+rke2r3
- Downstream CNI: rke2-canal
- Downstream ingress version: rancher/hardened-traefik:v3.6.10-build20260309
- Downstream nodes:
- node1: 192.168.5.10
- node2: 192.168.5.11
- node3: 192.168.5.12
Observed behavior:
cattle-cluster-agentstarts normally.- It can access
https://mycompany-rancher/ping - It resolves the Rancher hostname correctly.
- It reads
/v3/settings/cacertssuccessfully. - The agent logs show
Connected to proxy. - On the Rancher side I then get:
[INFO] Handling backend connection request [c-node][ERROR] error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF[ERROR] Error during subscribe websocket: close sent
What I already tested:
- Strict CA / valid certificate path
- Agent restart / rollout restart
- Clearing
HTTP_PROXY/HTTPS_PROXY/NO_PROXYoncattle-cluster-agent - Changing
dnsPolicyfromClusterFirsttoDefault - Verifying UFW is inactive on all downstream nodes
- Verifying UDP 8472 is listening on all downstream nodes
- Confirming
rke2-canalpods are healthy on all downstream nodes
Relevant logs:
Agent:
Connecting to wss://mycompany-rancher/v3/connect/registerConnected to proxy
Rancher:
Handling backend connection request [c-node]error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOFError during subscribe websocket: close sent
Additional details:
- Rancher accepts the backend connection request, but the websocket tunnel is then closed by Rancher/remotedialer with close code 1006 and unexpected EOF.
- DNS resolution and HTTPS reachability from the downstream agent pod appear to be working correctly.
Question:
Has anyone seen this with Rancher 2.14.0 and Traefik ingress on the Rancher cluster?
Is there a known incompatibility or recommended workaround for imported cluster websocket/remotedialer disconnects after the initial successful connection? How can this be resolved?