Cross-host intercontainer communication trouble

FWIW, we too had a similar issue with two preexisting ubuntu machines where we ran the rancher/agent container. The containers on those two machines simply could not communicate.

We then deployed two new machines from within Rancher and found that the new machines worked!

The investigation went something like:

  • Our test case was as follows: click “Execute shell” on the “Network Agent” container on machine1. Type ping <ip-of-network-agent-on-machine-2>. We knew this had to work, but simply didn’t.
  • The iptables rules on both machines were fine, and we were able to verify this by running an nc -ul 501 on machine1 and echo hello | nc -u localhost 501 on machine2.
  • On machine1, list the ipsec configurations: swanctl --list-conns. Note that there is a configuration for machine2 (so the rancher agent was doing its job).
  • Then, list the ipsec active tunnels: swanctl --list-sas. We noted that there was no active tunnel (see image below)
  • Next we checked the ipsec logs: cat /var/log/rancher-net.log. I don’t have them at hand now but whenever we saw an attempt to establish the tunnel, it quickly errored out with: “No such file or directory”.
  • That error message isn’t particularly useful but a google search led us to believe it was a missing kernel module on the host machine itself. Sure enough, we ssh’ed to the host (machine1) and tried a quick modprobe: modprobe authenc. We got back a “No such file or directory” error.
  • We then tried: depmod -a and still got: depmod: FATAL: could not search modules: No such file or directory.
  • At this point it was clear that we had an outdated kernel and our modules had probably been cleaned (ubuntu machine).
  • The final solution was a simple upgrade of linux-image and a quick reboot.

While it may not be the same problem you’re experiencing, I’m writing here as our investigation may help others to diagnose their issues.