PDA

View Full Version : nova-api metadata service



jmozdzen
25-Aug-2015, 16:41
Hi *,

I'll helping out with network setups for a fresh SUSE Openstack Cloud 5 setup.

An instance is trying to reach the nova-api metadata service, by contacting 169.254.169.254 port 80, but no response is received. We can test-case this by using i.e. "wget http://169.254.169.254:80" from within the instance.

The VM receives a fixed address via DHCP (192.168.123.x), and is communicating successfully via (in our case) VLAN 115 ("linuxbridge" setup, no OVS). I can see the instance's connection request to the metadata service via tcpdump on the network node, by listening on bond0.115.

From what I could gather by looking at the Openstack documentation, I expect to see some DNAT rule which would translate the request for 169.254.169.254 port 80 to a request to the actual metadata service, listening on the control node (192.168.124.81, port 8775). There is no such rule in iptables, neither on the control node, nor the compute node, nor inside the instance.

Once I add an according DNAT rule on the control node ("iptables -t nat -I nova-api-PREROUTING -d 169.254.169.254 -p tcp -m tcp --dport 80 -j DNAT --to-destination 192.168.124.81:8775"), I see according replies to the instance (again tcpdump'ing on the control node). But as there is no route on the control node that directs traffic for 192.168.123.0/24 to the according VLAN, all those replies are sent to the external default router (on its specific VLAN, *not* VLAN 115). This are the routing tables set by Openstack Cloud on the control node:

root@d0c-c4-7a-06-72-96:~ # ip route list table local
broadcast 127.0.0.0 dev lo proto kernel scope link src 127.0.0.1
local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1
local 127.0.0.2 dev lo proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1
broadcast 192.168.124.0 dev bond0 proto kernel scope link src 192.168.124.81
local 192.168.124.81 dev bond0 proto kernel scope host src 192.168.124.81
broadcast 192.168.124.255 dev bond0 proto kernel scope link src 192.168.124.81
broadcast 192.168.126.0 dev brqc0d4b1ba-e6 proto kernel scope link src 192.168.126.2
local 192.168.126.2 dev brqc0d4b1ba-e6 proto kernel scope host src 192.168.126.2
broadcast 192.168.126.255 dev brqc0d4b1ba-e6 proto kernel scope link src 192.168.126.2
root@d0c-c4-7a-06-72-96:~ # ip route list table main
default via 192.168.126.1 dev brqc0d4b1ba-e6 metric 100
127.0.0.0/8 dev lo scope link
192.168.124.0/24 dev bond0 proto kernel scope link src 192.168.124.81
192.168.126.0/24 dev brqc0d4b1ba-e6 proto kernel scope link src 192.168.126.2
root@d0c-c4-7a-06-72-96:~ # ip route list table default
root@d0c-c4-7a-06-72-96:~ #

Is using the metadata service supposed to work at all, and would we need to add some specific setup in order to make this work? Or should this work "out of the box"? The SUSE Cloud documentation is rather unspecific on this.

Any feedback (even "works for me out of the box") would be appreciated.

Regards,
Jens

jmozdzen
27-Aug-2015, 18:14
Hi *,

so I started to dig deeper, and noticed that SUSE Cloud, being built on OpenStack, uses Linux network namespaces - so my earlier approach was plain wrong. But as it still doesn't work, here's the current state of affairs:

- still using linuxbridge
- fixed addresses network is 192.168.123.0, mapped to VLAN 115 on the (real) Ethernet
- the instance can reach 192.168.123.1 perfectly

I've traced the traffic flow for our instance's access to the metadata service, here's what I found:

- instance wants to send a request for 169.254.169.254:80
- instance sends ARP for 169.254.169.254
- instance receives ARP response with MAC2 ("arp reply 169.254.169.254 is-at fa:16:3e:13:89:6b")
- instance sends IP packet for 169.254.169.254 to MAC2

That's where it currently ends.

The situation on the control node is:

- the control node has multiple network name spaces
- one of these netns hosts the vnif with the IP 192.168.123.1, which has MAC1 (*not MAC2* as seen in the ARP reply), has the "metadata proxy" service listening on port 9697 (on any interface) and has an iptables rule to map 169.254.169.254:80 to port 9798

root@control # ip netns exec qrouter-e938ca30-295c-405a-8790-94253ad288c2 iptables -t nat -L -vn
[...]
Chain neutron-l3-agent-PREROUTING (1 references)
pkts bytes target prot opt in out source destination
0 0 REDIRECT tcp -- * * 0.0.0.0/0 169.254.169.254 tcp dpt:80 redir ports 9697
[...]
root@control # ip netns exec qrouter-e938ca30-295c-405a-8790-94253ad288c2 ip addr list
18: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
21: qr-3850d83f-6c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether fa:16:3e:7f:d1:51 brd ff:ff:ff:ff:ff:ff
inet 192.168.123.1/24 brd 192.168.123.255 scope global qr-3850d83f-6c
inet6 fe80::f816:3eff:fe7f:d151/64 scope link
valid_lft forever preferred_lft forever
25: qg-8c43ded2-75: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether fa:16:3e:37:a7:9b brd ff:ff:ff:ff:ff:ff
inet 192.168.126.129/24 brd 192.168.126.255 scope global qg-8c43ded2-75
inet 192.168.126.138/32 brd 192.168.126.138 scope global qg-8c43ded2-75
inet 192.168.126.145/32 brd 192.168.126.145 scope global qg-8c43ded2-75
inet 192.168.126.146/32 brd 192.168.126.146 scope global qg-8c43ded2-75
inet 192.168.126.147/32 brd 192.168.126.147 scope global qg-8c43ded2-75
inet6 fe80::f816:3eff:fe37:a79b/64 scope link
valid_lft forever preferred_lft forever

root@control # ip netns exec qrouter-e938ca30-295c-405a-8790-94253ad288c2 lsof -i4 -P |grep 9697
neutron-n 12435 root 8u IPv4 36404 0t0 TCP *:9697 (LISTEN)
root@control #


- one of these netns hosts a vnif with the MAC address 192.168.123.2 and 169.254.169.254, with MAC2, but has only the dnsmasq service listening (and no iptable rules for any remapping)


root@control # ip netns exec qdhcp-e423e6c2-54e1-4e41-99bb-231427cc7c8f ip addr list
7: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
9: ns-4ee87c22-5a: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether fa:16:3e:13:89:6b brd ff:ff:ff:ff:ff:ff
inet 192.168.123.2/24 brd 192.168.123.255 scope global ns-4ee87c22-5a
inet 169.254.169.254/16 brd 169.254.255.255 scope global ns-4ee87c22-5a
inet6 fe80::f816:3eff:fe13:896b/64 scope link
valid_lft forever preferred_lft forever

root@control # ip netns exec qdhcp-e423e6c2-54e1-4e41-99bb-231427cc7c8f lsof -i4 -P -n
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
dnsmasq 8575 dnsmasq 3u IPv4 19146 0t0 UDP *:67
dnsmasq 8575 dnsmasq 5u IPv4 19149 0t0 UDP 169.254.169.254:53
dnsmasq 8575 dnsmasq 6u IPv4 19150 0t0 TCP 169.254.169.254:53 (LISTEN)
dnsmasq 8575 dnsmasq 7u IPv4 19151 0t0 UDP 192.168.123.2:53
dnsmasq 8575 dnsmasq 8u IPv4 19152 0t0 TCP 192.168.123.2:53 (LISTEN)
root@control # ps ax|grep 8575
2680 pts/2 S+ 0:00 grep 8575
8575 ? S 0:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=ns-4ee87c22-5a --except-interface=lo --pid-file=/var/lib/neutron/dhcp/e423e6c2-54e1-4e41-99bb-231427cc7c8f/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/e423e6c2-54e1-4e41
-99bb-231427cc7c8f/host --addn-hosts=/var/lib/neutron/dhcp/e423e6c2-54e1-4e41-99bb-231427cc7c8f/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/e423e6c2-54e1-4e41-99bb-231427cc7c8f/opts --leasefile-ro --dhcp-range=set:tag0,192.168.123.0,static,86400s --dhcp-lease-max=25
6 --conf-file= --server=192.168.103.4 --domain=openstack.local
root@control # ip netns exec qdhcp-e423e6c2-54e1-4e41-99bb-231427cc7c8f iptables -t nat -L -vn
Chain PREROUTING (policy ACCEPT 176 packets, 23776 bytes)
pkts bytes target prot opt in out source destination

Chain INPUT (policy ACCEPT 176 packets, 23776 bytes)
pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 154 packets, 46727 bytes)
pkts bytes target prot opt in out source destination

Chain POSTROUTING (policy ACCEPT 154 packets, 46727 bytes)
pkts bytes target prot opt in out source destination
root@control #


So it's clear why the instance's requests don't reach the metadata proxy: They're sent do a different netns.

Now the question is either why are there these two distinct netns for these two services (dnsmasq and metadata proxy, router), or why is the metadata proxy listening in that other namespace, which doesn't have the 169.254.169.254 address active?

Is it a configuration error on our behalf, or might this be a product issue?

Regards,
Jens

vuntz
02-Sep-2015, 14:11
Have you tried with the cirros image? That's generally the first thing to try (to ensure that the issue is not coming from the image that is being used).

Also, we have an update coming (if it's not already out) that includes https://build.opensuse.org/package/view_file/Cloud:OpenStack:Juno/openstack-neutron/0002-Add-non-isolated-network-metadata-server-route.patch?expand=1 -- this might be relevant here.

jmozdzen
02-Sep-2015, 14:56
Hi vuntz,

thank you for looking into this.


Have you tried with the cirros image? That's generally the first thing to try (to ensure that the issue is not coming from the image that is being used).

no, we haven't - but being a network professional, I was able to follow the actual packet flow and could tell that the VM is sending proper requests on the proper interface/VLAN to the proper router.



Also, we have an update coming (if it's not already out) that includes https://build.opensuse.org/package/view_file/Cloud:OpenStack:Juno/openstack-neutron/0002-Add-non-isolated-network-metadata-server-route.patch?expand=1 -- this might be relevant here.

If I understand the comments correctly, that patch will not be relevant to our situation - I did trace the packet right to the control node, it's sent to where it should go network-wise (to the netns that hosts 169.254.169.254)- it's just that on the control node, the metadata proxy is placed in a different netns!

- one netns has the default router (192.168.123.1) and the metadata proxy (listening on *:9697). That netns has a rule to map metadata requests (169.254.169.254:80) to the proxy port

- a *different* netns runs the dnsmasq service and has an interface with the 169.254.169.254 address.

So network-wise, the packet flow is correct: From the VM to the control node's interface with the 169.254.169.254 address. Unfortunately, in that interface's netns, no-one is listening for the request.

I cannot tell how it *should* be set up - but it seems either the metadata proxy service and the iptables rule belong in the other netns (the one where dnsmasq runs and 169.254.169.254 is configured), or the IP address 169.254.169.254 is configured in the wrong netns...

Regards,
Jens

jmozdzen
03-Sep-2015, 12:24
Hi vuntz,

Have you tried with the cirros image? That's generally the first thing to try (to ensure that the issue is not coming from the image that is being used).

I'm currently off-site, but my co-worker (who's in charge of our Cloud installation) has tested using via CirrOS... yes, that one works.

So while I declare this issue resolved ;), I'm more than eager to find out how the CirrOS image works differently. But that'll have to wait until I'm back in the office...

Thank you for poking with the CirrOS stick ;).

Regards,
Jens

rsblendido
03-Sep-2015, 14:44
So it's clear why the instance's requests don't reach the metadata proxy: They're sent do a different netns.

Now the question is either why are there these two distinct netns for these two services (dnsmasq and metadata proxy, router), or why is the metadata proxy listening in that other namespace, which doesn't have the 169.254.169.254 address active?

Is it a configuration error on our behalf, or might this be a product issue?



You discovered most of the info just looking at the setup, very smart! anyway here is the explanation...
The metadata proxy can be spawned either by the l3 agent or by the dhcp agent. Usually it's spawned by the l3 agent but if a network is not connected to a router (isolated) and enable_isolated_metadata is set in the dhcp agent configuration (the default is true for SUSE cloud) the proxy is spawned by the dhcp agent.
In the first case it will live in the router namespace, the proxy will listen to 9697 and there are iptables rules to map 169.254.169.254:80 to port 9798 .
In the second case a route will be injected in the instance through dhcp option 121 setting the dhcp port ip (in your setup 192.168.123.2) as next hop for the metadata server. In this case the metadata proxy will listen to port 80 inside the dhcp namespace, no iptable rule is needed .

From what you report it seems that you have both modes on, which is weird. I assume the network is connected to a router, so the instance shouldn't get 192.168.123.2 as next hop for the metadata server.
I assume the metadata proxy is not running in the dhcp namespace, otherwise it would work...can you please check just "ps aux | grep metadata | grep <network_id>"
What did you do exactly to create the network? Did you change some configuration? It seems something got stuck and wasn't cleaned up properly.

Also...



root@control # ip netns exec qdhcp-e423e6c2-54e1-4e41-99bb-231427cc7c8f ip addr list
7: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
9: ns-4ee87c22-5a: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether fa:16:3e:13:89:6b brd ff:ff:ff:ff:ff:ff
inet 192.168.123.2/24 brd 192.168.123.255 scope global ns-4ee87c22-5a
inet 169.254.169.254/16 brd 169.254.255.255 scope global ns-4ee87c22-5a
inet6 fe80::f816:3eff:fe13:896b/64 scope link
valid_lft forever preferred_lft forever



inet 169.254.169.254/16 brd 169.254.255.255 scope global ns-4ee87c22-5a that's weird. Is there a subnet with that ip range? Did you modify something manually?

jmozdzen
03-Sep-2015, 16:06
Hi rsblendido,

anyway here is the explanation...

thank you for those details, these help a lot to understand what we should be seeing.


From what you report it seems that you have both modes on, which is weird. I assume the network is connected to a router, so the instance shouldn't get 192.168.123.2 as next hop for the metadata server.

I'm off-site ATM, but I remember the VM's default route pointed to 192.168.123.1. I couldn't see traffic for 192.168.123.2 neither, but that may have been because of filtering during tcpdump.


I assume the metadata proxy is not running in the dhcp namespace, otherwise it would work...can you please check just "ps aux | grep metadata | grep <network_id>"

You're right, "lsof" was confirming this: Only in the router NS there was the proxy listening on that port, while in the DHCP namespace, only dnsmasq had open IPv4 ports at all.


What did you do exactly to create the network? Did you change some configuration? It seems something got stuck and wasn't cleaned up properly.

Left-overs may be an option... we've had more than our fair share of problems getting things to run on SLES12 compute node, so there were plenty of reinstalls. But I'm pretty sure we never touched 169.* networks anywhere. At least not knowingly :)

I got feedback from my co-worker earlier today, that running a CirrOS instance does indeed obtain metadata (see the other branch of this thread). We'll be tracing this (and with another instance of the original image) once I'm back on-site, to see why the CirrOS VM can receive metadata at all. Maybe it's falling back to some other mechanism, I'll find out and report back!

Thank you for taking your time to look into this issue, we really appreciate it.

Regards,
Jens

aspiers
04-Sep-2015, 12:06
Hi Jens, as you can see you're in very safe hands with Rossella :-) I just wanted to ask which OS is running in the guest instance you are seeing these issues with? And can you provide us with the routing table within the instance? If the guest is SLE12, I'm wondering if this could be related to an issue which one of our partners discovered: neutron's dhcp server violated RFC3442:

https://bugs.launchpad.net/neutron/+bug/1317935

Most OSes are tolerant of this violation, but SLE12 uses wicked which is not currently, and that can result in the guest getting the wrong routes. Having said that, the fix for this bug is in Juno so maybe already in SUSE Cloud 5 ...

Here's another wild guess: I wonder if this change is related somehow? https://build.opensuse.org/package/view_file/Cloud:OpenStack:Juno/openstack-neutron/0002-Add-non-isolated-network-metadata-server-route.patch?expand=1

jmozdzen
04-Sep-2015, 16:46
Oh boy,

I should get rid of that coffee... or get more of it. 25 years of experience obviously don't prevent from making a fool of oneself.

vuntz was correct, right in the first place. It was all right before my eyes, but I wouldn't see it: There is absolutely nothing wrong with the metadata service in our cloud installation. And that patch (mentioned by aspiers, too) will most likely be helpful in our situation.



- instance wants to send a request for 169.254.169.254:80
- instance sends ARP for 169.254.169.254
- instance receives ARP response with MAC2 ("arp reply 169.254.169.254 is-at fa:16:3e:13:89:6b")
- instance sends IP packet for 169.254.169.254 to MAC2


Why on earth would a host ever send an ARP request for 169.254.169.254, unless it has an explicit (or implicit) route defined for a network containing that IP, pointing straight to the local LAN? What I was expecting and didn't get was an ARP request for the default router (192.168.123.1). This should have put me on the right track, right there and then.

The instances in question (OpenSUSE 13.1) do have a route for 169.254.0.0/16:


host-192-168-123-194:~ # netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 192.168.123.1 0.0.0.0 UG 0 0 0 eth0
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
192.168.123.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
host-192-168-123-194:~ #


Once I remove that route, the requests to the metadata proxy work as expected. CirrOS works because it does not have that route in the first place.

Rossella, thank you for spending your time in educating me some innards of OpenStack, this still was very helpful. Since everything works once we alter the instance's routing (or possibly after applying 0002-Add-non-isolated-network-metadata-server-route.patch, which I haven't tested), I guess that it is actually set up as expected by the OpenStack developers. OTOH, I'll still call it confusing to have a setup where one interface does have the target IP address, but you rely on a certain routing setup so that the packets will *not* reach that interface, but instead go to the default router where some network voodoo ;) is used to NAT the request to "the local IP address, port 9697".

If you believe that what we're seeing here is some form of error, please let me know. We're open to assist any debugging, although we're running mostly a default setup and you should see the same on any Cloud5 installation.

Everyone, thank you for your attention and assistance with this issue. And I apologize for stirring dust for mostly nothing :o.

Best regards,
Jens