FWIW, we too had a similar issue with two preexisting ubuntu machines where we ran the rancher/agent container. The containers on those two machines simply could not communicate.
We then deployed two new machines from within Rancher and found that the new machines worked!
The investigation went something like:
- Our test case was as follows: click “Execute shell” on the “Network Agent” container on machine1. Type
ping <ip-of-network-agent-on-machine-2>. We knew this had to work, but simply didn’t. - The
iptablesrules on both machines were fine, and we were able to verify this by running annc -ul 501on machine1 andecho hello | nc -u localhost 501on machine2. - On machine1, list the ipsec configurations:
swanctl --list-conns. Note that there is a configuration for machine2 (so the rancher agent was doing its job). - Then, list the ipsec active tunnels:
swanctl --list-sas. We noted that there was no active tunnel (see image below)
- Next we checked the ipsec logs:
cat /var/log/rancher-net.log. I don’t have them at hand now but whenever we saw an attempt to establish the tunnel, it quickly errored out with: “No such file or directory”. - That error message isn’t particularly useful but a google search led us to believe it was a missing kernel module on the host machine itself. Sure enough, we ssh’ed to the host (machine1) and tried a quick modprobe:
modprobe authenc. We got back a “No such file or directory” error. - We then tried:
depmod -aand still got:depmod: FATAL: could not search modules: No such file or directory. - At this point it was clear that we had an outdated kernel and our modules had probably been cleaned (ubuntu machine).
- The final solution was a simple upgrade of
linux-imageand a quick reboot.
While it may not be the same problem you’re experiencing, I’m writing here as our investigation may help others to diagnose their issues.
