This post will try to illustrate how the Openstack VM packet sent from the VM up until it reach the external network. In this exercise, we are going to use vanilla Openstack with Openvswitch as neutron plugin.
We will use the following diagram from (https://docs.openstack.org/liberty/networking-guide/scenario-classic-ovs.html) as our reference.
-
Compute node network components
-
Network node network components
Basic info gathering
First thing first, let’s collect some information about our Lab setup. For simplicity, let focus on a single VM and then we will try to find out how the connection is build.
So, Let’s pick a VM name “c-private”, and for now we are going to focus on north-south traffic.
-
First, let’s find which compute that host this VM
controller# nova show c-private | egrep "instance|hypervis|name" | OS-EXT-SRV-ATTR:hostname | c-private | | OS-EXT-SRV-ATTR:hypervisor_hostname | compute01 | | OS-EXT-SRV-ATTR:instance_name | instance-00000003 | | OS-EXT-SRV-ATTR:root_device_name | /dev/vda | | name | c-private |
-
Then, find the IP and MAC address of this VM
controller# nova interface-list c-private +------------+--------------------------------------+--------------------------------------+--------------+-------------------+ | Port State | Port ID | Net ID | IP addresses | MAC Addr | +------------+--------------------------------------+--------------------------------------+--------------+-------------------+ | ACTIVE | f7eae624-3488-4ccb-962a-d94f9b86aeb7 | 1763ddbb-7689-4266-919d-e250e7577749 | 172.19.1.3 | fa:16:3e:3b:53:26 | +------------+--------------------------------------+--------------------------------------+--------------+-------------------+
-
Summary
-
Here is what we know so far
- VM name: c-private
- YM IP: 172.19.1.3
- VM mac address: fa:16:3e:3b:53:26
- compute node: compute01
- instance name: instance-00000003
-
Now we are going to trace the path of the packet from the VM itself until the packet reach the external network.
Packet walk @ Compute Node
Let’s start from the VM. We need to go to the compute node by ssh into it
-
First, we know that VM is connected to the host via TAP interface. This TAP interface will be used to send the outgoing traffic from the VM. So, we need to find which TAP interface is connected to this VM
-
To find the tap interface associated to this VM NIC, the only way that i know is by using classic Libvirt virsh command on that particular VM instance. From the previous step, we know that the VM instance id is “instance-00000003”
compute01# virsh dumpxml instance-00000003 | egrep "mac|tap" <mac address='fa:16:3e:3b:53:26'/> <target dev='tapf7eae624-34'/>
-
OK, now we know that VM “c-private” is connected via tapf7eae624-34.
-
According to the compute node network diagram in the beginning of this post, this TAP interface should be connected to a linux bride first for firewall policy and/or QoS policy implementation before actually connected to Openvswitch.
-
Let’s check linux bridge table on compute node
compute-node# brctl show bridge name bridge id STP enabled interfaces qbr1d939c00-e4 8000.762df08bd21e no qvb1d939c00-e4 tap1d939c00-e4 qbrf7eae624-34 8000.c6fbe09bd3ee no qvbf7eae624-34 tapf7eae624-34
-
Great! The above output shows “tapf7eae624-34” is connected to a linux bridge name qbrf7eae624-34 and this qbrf7eae624-34 bridge has another virtual interface named qvbf7eae624-34.
-
So, to review, our packet flow now become
c-private VM vNIC -- tapf7eae624-34 -- bridge qbrf7eae624-34 -- qvbf7eae624-34
-
Based on the documentation again, the next component would be the Openvswitch itself.
-
Now, we check openvswitch configuration and check how this tap interface is connected
compute01# ovs-vsctl show 9b36fd19-21cb-4d25-b590-1f822c18d373 Manager "ptcp:6640:127.0.0.1" is_connected: true Bridge br-tun Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "vxlan-c0a8010c" Interface "vxlan-c0a8010c" type: vxlan options: {df_default="true", in_key=flow, local_ip="192.168.1.13", out_key=flow, remote_ip="192.168.1.12"} Port br-tun Interface br-tun type: internal Port patch-int Interface patch-int type: patch options: {peer=patch-tun} Bridge br-int Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "qvo1d939c00-e4" tag: 1 Interface "qvo1d939c00-e4" Port br-int Interface br-int type: internal Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Port "qvof7eae624-34" tag: 2 Interface "qvof7eae624-34"
-
What can we get from above output
- virtual interface qvof7eae624-34 is connected to openvswitch br-int bridge and have internal tag=2 assigned by OVS.
- after that, the packet will be forwarded to br-tun bridge via patch interface.
- This means, untagged packet coming from the VM is received via *qvof7eae624-34 and then sent out to to br-tun bridge with additional internal tag.
-
According to the Compute node network components diagram, for north-south traffic, br-tun will forward the packet to network node using vxlan tunnel.
-
Unfortunately, the output of “ovs-vsctl show” does not tell us about the mapping between internal tag and the vxlan VNI ID.
-
Fortunately, i got answer from ask.openstack.org on how to find the mapping.
-
So, let’s check the flow table inside openvswitch on compute node. For this, we need to know the VNI ID (segmentation ID) that assigned by Openstack for the virtual network that the VM is connected.
-
Here is the virtual network where VM “c-private” is connected
controller# nova show c-private | grep network | net1 network | 172.19.1.3, 172.24.4.3 |
-
OK, we know the VM is connected to net1. Next step, find out what is the vxlan VNI ID for this virtual network
controller# neutron net-show net1 +---------------------------+--------------------------------------+ | Field | Value | +---------------------------+--------------------------------------+ | admin_state_up | True | | availability_zone_hints | | | availability_zones | nova | | created_at | 2017-05-30T01:14:03Z | | description | | | id | 1763ddbb-7689-4266-919d-e250e7577749 | | ipv4_address_scope | | | ipv6_address_scope | | | mtu | 1450 | | name | net1 | | project_id | 0283565b977d4bdaaa57fa8bdf2e0159 | | provider:network_type | vxlan | | provider:physical_network | | | provider:segmentation_id | 90 | | revision_number | 11 | | router:external | False | | shared | True | | status | ACTIVE | | subnets | 0685edba-9a91-4bc8-badc-5427e716693a | | tags | | | tenant_id | 0283565b977d4bdaaa57fa8bdf2e0159 | | updated_at | 2017-05-31T06:44:28Z | +---------------------------+--------------------------------------+
-
From the output above, we know that the vxlan ID is 90 decimal (0x5a in hexadescimal)
-
Let’s also find out the MAC address of VM ‘c-private’ default gateway, by searching neutron ports that belong to the net1 subnet “0685edba-9a91-4bc8-badc-5427e716693a” above.
controller# neutron port-list | grep 0685edba-9a91-4bc8-badc-5427e716693a | 2c628d36-765c-420b-b008-b6311a0ed17a | | 0283565b977d4bdaaa57fa8bdf2e0159 | fa:16:3e:ca:72:d8 | {"subnet_id": "0685edba-9a91-4bc8-badc-5427e716693a", "ip_address": "172.19.1.1"} | | 3775bfe7-6ed2-4a95-883e-701a6d04e6a2 | | 0283565b977d4bdaaa57fa8bdf2e0159 | fa:16:3e:65:8c:df | {"subnet_id": "0685edba-9a91-4bc8-badc-5427e716693a", "ip_address": "172.19.1.2"} | | f7eae624-3488-4ccb-962a-d94f9b86aeb7 | | 0283565b977d4bdaaa57fa8bdf2e0159 | fa:16:3e:3b:53:26 | {"subnet_id": "0685edba-9a91-4bc8-badc-5427e716693a", "ip_address": "172.19.1.3"} |
-
Before checking the ovs table, we should generate some traffic to make sure ovs has entries on it
c-private$ ping 172.19.1.1 PING 172.19.1.1 (172.19.1.1): 56 data bytes 64 bytes from 172.19.1.1: seq=0 ttl=64 time=1.254 ms 64 bytes from 172.19.1.1: seq=1 ttl=64 time=0.847 ms 64 bytes from 172.19.1.1: seq=2 ttl=64 time=1.539 ms --- 172.19.1.1 ping statistics --- 3 packets transmitted, 3 packets received, 0% packet loss round-trip min/avg/max = 0.847/1.213/1.539 ms
-
now, let’s also dump the mac address table on the compute node ovs
compute01# ovs-appctl fdb/show br-tun port VLAN MAC Age compute01# ovs-appctl fdb/show br-int port VLAN MAC Age 2 2 fa:16:3e:3b:53:26 0 --> mac address of VM c-private 1 2 fa:16:3e:ca:72:d8 0 --> mac address of net1 default gateway
-
And also list the ovs port
compute01# ovs-ofctl show br-tun ... 1(patch-int): addr:8e:ac:86:71:04:ce config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max 2(vxlan-c0a8010c): addr:ba:d7:23:6c:d0:06 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max LOCAL(br-tun): addr:7a:d1:cc:0f:f3:45 config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0 compute01# ovs-ofctl show br-int ... 1(patch-tun): addr:66:b0:4e:2c:a7:a5 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max 2(qvof7eae624-34): addr:d6:bb:95:ae:c9:a3 config: 0 state: 0 current: 10GB-FD COPPER speed: 10000 Mbps now, 0 Mbps max 3(qvo1d939c00-e4): addr:f6:7a:d2:4b:a0:ea config: 0 state: 0 current: 10GB-FD COPPER speed: 10000 Mbps now, 0 Mbps max LOCAL(br-int): addr:5e:2a:4c:35:97:4d config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
- From the port list and mac address table output above we have information about:
- qvof7eae624-34 is OVS bridge br-int port 2
- vxlan tunnel to network node vxlan-c0a8010c is OVS bridge br-tun port 2
-
Ok now we go to the flow table
compute01# ovs-ofctl dump-flows br-int ... cookie=0xb950e065fe76cc7b, duration=114016.867s, table=0, n_packets=142979, n_bytes=14018035, idle_age=0, hard_age=65534, priority=9,in_port=2 actions=resubmit(,25) cookie=0xb950e065fe76cc7b, duration=114016.870s, table=25, n_packets=153695, n_bytes=14468051, idle_age=0, hard_age=65534, priority=2,in_port=2,dl_src=fa:16:3e:3b:53:26 actions=NORMAL ... compute01# ovs-ofctl dump-flows br-tun ... cookie=0x876248e21ebb5890, duration=345233.029s, table=0, n_packets=155284, n_bytes=14620597, idle_age=0, hard_age=65534, priority=1,in_port=1 actions=resubmit(,2) cookie=0x876248e21ebb5890, duration=345233.026s, table=2, n_packets=155246, n_bytes=14617083, idle_age=0, hard_age=65534, priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20) cookie=0x876248e21ebb5890, duration=142911.162s, table=20, n_packets=153436, n_bytes=14442734, hard_timeout=300, idle_age=0, hard_age=0, priority=1,vlan_tci=0x0002/0x0fff,dl_dst=fa:16:3e:ca:72:d8 actions=load:0->NXM_OF_VLAN_TCI[],load:0x5a->NXM_NX_TUN_ID[],output:2 cookie=0x876248e21ebb5890, duration=14.744s, table=20, n_packets=4, n_bytes=336, hard_timeout=300, idle_age=9, hard_age=9, priority=1,vlan_tci=0x0002/0x0fff,dl_dst=fa:16:3e:65:8c:df actions=load:0->NXM_OF_VLAN_TCI[],load:0x5a->NXM_NX_TUN_ID[],output:2 cookie=0x876248e21ebb5890, duration=222938.295s, table=22, n_packets=17, n_bytes=1528, idle_age=14, hard_age=65534, priority=1,dl_vlan=2 actions=strip_vlan,load:0x5a->NXM_NX_TUN_ID[],output:2 ...
- From the flow table above we can see that
- Packet arrived from the VM on OVS bridge br-int port 2 on table=0 will be redirected to table=25. Additionally, from the config we also know that all packet arrived in br-int port 2 will be assigned internal tag = 2
- In table=25, there is a rule that match with VM source mac address. This rule has action=NORMAL, which mean OVS will do normal mac address lookup.
- Based on mac address table, packet that sent from the VM to default gateway 172.19.1.1/fa:16:3e:ca:72:d8 will be forwarded to br-int port 1 which is patch interface to br-tun.
- Inside br-tun bridge table=0, packet that coming from br-int patch interface, which is ovs br-tun port 1, will be redirected to table 2
- Then, the packet will hit another rule in table=2 where ovs will redirect the packet again to table=20
- In table=20 the packet will hit one of the rule.
- For this example, we focus on the packet going to default gateway 172.19.1.1/fa:16:3e:ca:72:d8
- OVS will then remove any internal ovs tag (load:0->NXM_OF_VLAN_TCI[]) and add vxlan VNI id = 0x5a (90 in decimal)
- Packet sent to network node inside vxlan tunnel
-
So, to review, our packet flow now become
c-private VM vNIC -- tapf7eae624-34 -- bridge qbrf7eae624-34 -- qvbf7eae624-34 -- ovs br-int -- qvof7eae624-34 -- patch-tun -- ovs br-tun -- patch-int -- vxlan vni 90
Packet walk @ network node
At the end of the previous step, the packet was sent to network node inside vxlan tunnel. This section will cover the packet walk on the network node. For clarity, i remove some non-relevant parts.
-
Let’s check the ovs config first
7b14d048-c73e-4c8a-af43-cf568cf79d6d Manager "ptcp:6640:127.0.0.1" is_connected: true Bridge br-ex Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port br-ex Interface br-ex type: internal Port phy-br-ex Interface phy-br-ex type: patch options: {peer=int-br-ex} Bridge br-int Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "tap2b56c518-fb" tag: 1 Interface "tap2b56c518-fb" type: internal Port "tap3775bfe7-6e" tag: 3 Interface "tap3775bfe7-6e" type: internal Port "qg-421b407a-81" tag: 4 Interface "qg-421b407a-81" type: internal Port "qg-ede3b53d-da" tag: 4 Interface "qg-ede3b53d-da" type: internal Port int-br-ex Interface int-br-ex type: patch options: {peer=phy-br-ex} Port "qg-a1b3323f-35" tag: 4 Interface "qg-a1b3323f-35" type: internal Port "qr-2c628d36-76" tag: 3 Interface "qr-2c628d36-76" type: internal Port "qr-b9a4d300-68" tag: 2 Interface "qr-b9a4d300-68" type: internal Port "tap80ac2f6b-87" tag: 2 Interface "tap80ac2f6b-87" type: internal Port br-int Interface br-int type: internal Port "qr-f80a6a75-e6" tag: 1 Interface "qr-f80a6a75-e6" type: internal Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Bridge br-tun Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port br-tun Interface br-tun type: internal Port "vxlan-c0a8010d" Interface "vxlan-c0a8010d" type: vxlan options: {df_default="true", in_key=flow, local_ip="192.168.1.12", out_key=flow, remote_ip="192.168.1.13"} Port patch-int Interface patch-int type: patch options: {peer=patch-tun} ovs_version: "2.6.1"
-
Same as before, ovs config doesn’t tell us the mapping between the vxlan id and internal tag. So we need to check the flow table again in network node
-
mac-address table
network-node# ovs-appctl fdb/show br-tun port VLAN MAC Age network-node# ovs-appctl fdb/show br-int port VLAN MAC Age 3 3 fa:16:3e:3b:53:26 1 8 3 fa:16:3e:ca:72:d8 1 1 4 ae:05:ca:a5:1e:4d 1 11 4 fa:16:3e:94:5d:70 1
-
br-tun port list
network-node# ovs-ofctl show br-tun ... 1(patch-int): addr:c6:ca:7a:8a:b7:09 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max 2(vxlan-c0a8010d): addr:42:ae:83:de:45:87 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max LOCAL(br-tun): addr:2a:40:69:cb:0b:4a config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max ...
-
br-int port list
network-node# ovs-ofctl show br-int ... 1(int-br-ex): addr:de:88:5e:32:fc:16 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max 2(tap80ac2f6b-87): addr:00:00:00:00:02:00 config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max 3(patch-tun): addr:fe:97:fd:ad:dd:17 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max 4(tap2b56c518-fb): addr:00:00:00:00:20:9d config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max 5(tap3775bfe7-6e): addr:00:00:00:00:00:0a config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max 6(qr-b9a4d300-68): addr:00:00:00:00:f0:c6 config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max 7(qr-f80a6a75-e6): addr:00:00:00:00:02:00 config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max 8(qr-2c628d36-76): addr:00:00:00:00:c0:b3 config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max 9(qg-421b407a-81): addr:00:00:00:00:90:78 config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max 10(qg-ede3b53d-da): addr:00:00:00:00:b0:fb config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max 11(qg-a1b3323f-35): addr:00:00:00:00:60:ca config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max LOCAL(br-int): addr:ce:89:0c:a6:87:4d config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max ...
-
br-tun flow table
network-node# ovs-ofctl dump-flows br-tun ... cookie=0xb5b1b0a3b000ddbe, duration=223464.064s, table=0, n_packets=155744, n_bytes=14660419, idle_age=0, hard_age=65534, priority=1,in_port=2 actions=resubmit(,4) cookie=0xb5b1b0a3b000ddbe, duration=223376.366s, table=4, n_packets=154264, n_bytes=14518735, idle_age=0, hard_age=65534, priority=1,tun_id=0x5a actions=mod_vlan_vid:3,resubmit(,10) cookie=0xb5b1b0a3b000ddbe, duration=223465.975s, table=10, n_packets=155744, n_bytes=14660419, idle_age=0, hard_age=65534, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,cookie=0xb5b1b0a3b000ddbe,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:1
-
br-int flow table
network-node# ovs-ofctl dump-flows br-int ... cookie=0x86954af3f33dce38, duration=223464.386s, table=0, n_packets=463806, n_bytes=43847428, idle_age=0, hard_age=65534, priority=0 actions=NORMAL ...
-
-
So, what do we get here:
- Packet from compute node is received on br-tun port vxlan-c0a8010d
- vxlan-c0a8010d has port id = 2
- from br-tun flow table=0, any packet received on port 2, will be redirected to table 4
- In table 4, packet received with vxlan vni id 0x5a (90) will be redirected to table 10 and add internal vlan tag = 3
- In table 10, send all packet to port 1
- br-tun port 1 is patch interface to br-int, so now the packet is in br-int bridge
- Inside br-int table 0, it will hit generic rule with action NORMAL which mean ovs will find the outgoing port based on mac address table
- From br-int mac address table we see that default gateway 172.19.1.1/fa:16:3e:ca:72:d8 is in br-int port 8 which is qr-2c628d36-76
- According the documentation, this qr-2c628d36-76 will be connected a router namespace
-
There could be multiple neutron router instances
# neutron router-list +------------------------+---------+------------------------+-------------------------+-------------+-------+ | id | name | tenant_id | external_gateway_info | distributed | ha | +------------------------+---------+------------------------+-------------------------+-------------+-------+ | 0b0cb65f-3739-4ca8 | r2 | 0283565b977d4bdaaa57fa | {"network_id": "d478a8b | False | False | | -a68b-271545681aa7 | | 8bdf2e0159 | 8-96fa-4996-a472-938215 | | | | | | | b0d1e2", "enable_snat": | | | | | | | false, | | | | | | | "external_fixed_ips": | | | | | | | [{"subnet_id": | | | | | | | "b193dc42-77ac-44cb-862 | | | | | | | 2-f15331cf6319", | | | | | | | "ip_address": | | | | | | | "172.24.4.2"}]} | | | | 58731090-1c82-4aa5-b31 | router1 | d67ee95c357642549d48c2 | {"network_id": "d478a8b | False | False | | 9-1598c7e6ae06 | | ea36250947 | 8-96fa-4996-a472-938215 | | | | | | | b0d1e2", "enable_snat": | | | | | | | true, | | | | | | | "external_fixed_ips": | | | | | | | [{"subnet_id": | | | | | | | "b193dc42-77ac-44cb-862 | | | | | | | 2-f15331cf6319", | | | | | | | "ip_address": | | | | | | | "172.24.4.9"}]} | | | | bfc87e1e-6b93-446f-a30 | r1 | 0283565b977d4bdaaa57fa | {"network_id": "d478a8b | False | False | | 5-6dc6153c1198 | | 8bdf2e0159 | 8-96fa-4996-a472-938215 | | | | | | | b0d1e2", "enable_snat": | | | | | | | true, | | | | | | | "external_fixed_ips": | | | | | | | [{"subnet_id": | | | | | | | "b193dc42-77ac-44cb-862 | | | | | | | 2-f15331cf6319", | | | | | | | "ip_address": | | | | | | | "172.24.4.12"}]} | | | +------------------------+---------+------------------------+-------------------------+-------------+-------+
-
And, each router instance is associated to a single network namespace
network-node # ip netns list qrouter-0b0cb65f-3739-4ca8-a68b-271545681aa7 qrouter-bfc87e1e-6b93-446f-a305-6dc6153c1198 qrouter-58731090-1c82-4aa5-b319-1598c7e6ae06 qdhcp-c218c762-e820-464b-b5b9-cfeb9eb51f89 qdhcp-1763ddbb-7689-4266-919d-e250e7577749 qdhcp-ab481d2f-05ac-4dea-a37d-0e0c8ab38444
-
Unfortunately, the list output above does not contain any information about the internal network id or internal subnet, so we have to find it by doing neutron router-port-list command on each router instance. For clarity i will only show the correct router instance below.
# neutron router-port-list bfc87e1e-6b93-446f-a305-6dc6153c1198 +--------------------------+------+--------------------------+-------------------+--------------------------+ | id | name | tenant_id | mac_address | fixed_ips | +--------------------------+------+--------------------------+-------------------+--------------------------+ | 2c628d36-765c- | | 0283565b977d4bdaaa57fa8b | fa:16:3e:ca:72:d8 | {"subnet_id": "0685edba- | | 420b-b008-b6311a0ed17a | | df2e0159 | | 9a91-4bc8-badc- | | | | | | 5427e716693a", | | | | | | "ip_address": | | | | | | "172.19.1.1"} | | a1b3323f-35c5-44b8-9e6d- | | | fa:16:3e:94:5d:70 | {"subnet_id": "b193dc42 | | be1b06fc6892 | | | | -77ac- | | | | | | 44cb-8622-f15331cf6319", | | | | | | "ip_address": | | | | | | "172.24.4.12"} | +--------------------------+------+--------------------------+-------------------+--------------------------+
-
From the output above, we know that the router instance id associated with net1 default gateway is bfc87e1e-6b93-446f-a305-6dc6153c1198
-
Let’s check interface list and routing table on the matching network namespace
-
interface list
network-node# ip netns exec qrouter-bfc87e1e-6b93-446f-a305-6dc6153c1198 ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 13: qr-2c628d36-76: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN qlen 1000 link/ether fa:16:3e:ca:72:d8 brd ff:ff:ff:ff:ff:ff inet 172.19.1.1/24 brd 172.19.1.255 scope global qr-2c628d36-76 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:feca:72d8/64 scope link valid_lft forever preferred_lft forever 16: qg-a1b3323f-35: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000 link/ether fa:16:3e:94:5d:70 brd ff:ff:ff:ff:ff:ff inet 172.24.4.12/24 brd 172.24.4.255 scope global qg-a1b3323f-35 valid_lft forever preferred_lft forever inet 172.24.4.3/32 brd 172.24.4.3 scope global qg-a1b3323f-35 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe94:5d70/64 scope link valid_lft forever preferred_lft forever
-
routing table
# ip netns exec qrouter-bfc87e1e-6b93-446f-a305-6dc6153c1198 ip r default via 172.24.4.1 dev qg-a1b3323f-35 172.19.1.0/24 dev qr-2c628d36-76 proto kernel scope link src 172.19.1.1 172.24.4.0/24 dev qg-a1b3323f-35 proto kernel scope link src 172.24.4.12
-
NAT table
network-node# ip netns exec qrouter-bfc87e1e-6b93-446f-a305-6dc6153c1198 iptables -L -t nat Chain PREROUTING (policy ACCEPT) target prot opt source destination neutron-l3-agent-PREROUTING all -- anywhere anywhere Chain INPUT (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination neutron-l3-agent-OUTPUT all -- anywhere anywhere Chain POSTROUTING (policy ACCEPT) target prot opt source destination neutron-l3-agent-POSTROUTING all -- anywhere anywhere neutron-postrouting-bottom all -- anywhere anywhere Chain neutron-l3-agent-OUTPUT (1 references) target prot opt source destination DNAT all -- anywhere 172.24.4.3 to:172.19.1.3 Chain neutron-l3-agent-POSTROUTING (1 references) target prot opt source destination ACCEPT all -- anywhere anywhere ! ctstate DNAT Chain neutron-l3-agent-PREROUTING (1 references) target prot opt source destination DNAT all -- anywhere 172.24.4.3 to:172.19.1.3 REDIRECT tcp -- anywhere 169.254.169.254 tcp dpt:http redir ports 9697 Chain neutron-l3-agent-float-snat (1 references) target prot opt source destination SNAT all -- 172.19.1.3 anywhere to:172.24.4.3 Chain neutron-l3-agent-snat (1 references) target prot opt source destination neutron-l3-agent-float-snat all -- anywhere anywhere SNAT all -- anywhere anywhere to:172.24.4.12 SNAT all -- anywhere anywhere mark match ! 0x2/0xffff ctstate DNAT to:172.24. 4.12 Chain neutron-postrouting-bottom (1 references) target prot opt source destination neutron-l3-agent-snat all -- anywhere anywhere /* Perform source NAT on outgoing traffic.*/
-
-
From the output above we know that
-
after the packet enter the namespace via interface qr-2c628d36-76, it will be routed to the external default gateway via interface qg-a1b3323f-35
-
the neutron router in this case will also perform NAT
- packet from VM with IP 172.19.1.3 will get static SNAT to 172.24.4.3
this is the floating IP associated to VM ‘c-private’
- Packets from other VM will have many to 1 NAT to 172.24.4.12
- packet from VM with IP 172.19.1.3 will get static SNAT to 172.24.4.3
Note: it is possible to disable NAT completely on neutron router, in this case no NAT is required and we can assign public IP directly to the VM.
-
- Great! Source IP to our packet of interest is translated to “public IP” now
- note: ‘public’ here means this IP is known/reachable from outside the openstack.
-
One last step find out how the packet sent to the external network
-
The packet was sent out from neutron namespace via interface qg-a1b3323f-35
-
This interface is connected back to openvswitch bridge br-int but with different internal vlan tag. For clarity, i copy again the relevant section from ovs-vsctl output above
-
ovs-vsctl output
Bridge br-ex Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port br-ex Interface br-ex type: internal Port phy-br-ex Interface phy-br-ex type: patch options: {peer=int-br-ex} Bridge br-int Port int-br-ex Interface int-br-ex type: patch options: {peer=phy-br-ex} Port "qg-a1b3323f-35" tag: 4 Interface "qg-a1b3323f-35" type: internal
-
br-int mac-address table
network-node# ovs-appctl fdb/show br-int port VLAN MAC Age 3 3 fa:16:3e:3b:53:26 1 8 3 fa:16:3e:ca:72:d8 1 1 4 ae:05:ca:a5:1e:4d 1 11 4 fa:16:3e:94:5d:70 1
-
-
And here are some additional information related to br-ex bridge and external gateway
-
ovs-ofctl list-ports output
network-node# ovs-ofctl show br-int ... 1(int-br-ex): addr:de:88:5e:32:fc:16 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max 11(qg-a1b3323f-35): addr:00:00:00:00:60:ca config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max network-node# ovs-ofctl show br-ex 1(phy-br-ex): addr:c2:0f:89:c0:78:90 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max LOCAL(br-ex): addr:ae:05:ca:a5:1e:4d config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max
-
ovs flow dumps on br-int
network-node# ovs-ofctl dump-flows br-int cookie=0x86954af3f33dce38, duration=223373.954s, table=0, n_packets=152182, n_bytes=14515209, idle_age=0, hard_age=65534, priority=3,in_port=1,vlan_tci=0x0000/0x1fff actions=mod_vlan_vid:4,NORMAL cookie=0x86954af3f33dce38, duration=223464.136s, table=0, n_packets=29, n_bytes=1530, idle_age=65534, hard_age=65534, priority=2,in_port=1 actions=drop cookie=0x86954af3f33dce38, duration=223464.386s, table=0, n_packets=463806, n_bytes=43847428, idle_age=0, hard_age=65534, priority=0 actions=NORMAL
-
ovs flow dumps on br-ex
network-node# ovs-ofctl dump-flows br-ex cookie=0x97910c7dc69d33fb, duration=265698.006s, table=0, n_packets=179268, n_bytes=17096370, idle_age=19, hard_age=65534, priority=0 actions=NORMAL
-
br-ex mac address table
network-node# ovs-appctl fdb/show br-ex port VLAN MAC Age 1 0 fa:16:3e:94:5d:70 29 LOCAL 0 ae:05:ca:a5:1e:4d 29
-
-
And here are some additional setup specific on this lab
- External default gateway : 172.24.4.1
- MAC address of external default gateway: ae:05:ca:a5:1e:4d
-
OK, what we can learn from output above
- Packet received by ovs br-int port qg-a1b3323f-35 (port 11) will hit the last rule in table 0 which has action NORMAL.
- from the config we also know that this packet will get internal vlan tag = 4
- OVS will do mac address lookup on br-int for the default gateway and find the outgoing port is ovs port 1 which is the patch interface to br-ex bridge int-br-ex
- On br-ex table 0, ovs will simply use mac address table to decide where to forward the packet
- In this case, the destination outgoing port is LOCAL
- this is a special case for my setup, in most/normal cases the outgoing interface would be the real physical NIC on the network node server.
- In my setup, i only have one NIC on network node, so i route the traffic from br-ex to the server eth0 interface. Basically i am using my network node as a router that connect Openstack and outside world.
- Packet received by ovs br-int port qg-a1b3323f-35 (port 11) will hit the last rule in table 0 which has action NORMAL.
-
So, to summarize, our packet flow is complete
@compute node c-private VM vNIC -- tapf7eae624-34 -- bridge qbrf7eae624-34 -- qvbf7eae624-34 -- qvof7eae624-34 -- ovs br-int -- patch-tun -- patch-int -- ovs br-tun -- vxlan-c0a8010c -- vxlan vni 90 @network node vxlan vni 90 -- vxlan-c0a8010d -- ovs br-tun -- patch-int -- patch-tun -- ovs br-int -- qr-2c628d36-76 -- namespace qrouter-bfc87e1e-6b93-446f-a305-6dc6153c1198 -- qg-a1b3323f-35 -- ovs br-int -- patch-int-br-ex -- br-ex -- external NIC (in normal setup, not in this lab)
So, that’s it for the nort-south traffic from the VM to the external network. The same flow is applied for traffic from external network to the VM, but in the reverse direction.
For east-west traffic, the concept also similar, the only difference is, the packet is sent directly to the other network node if both source and destination are in the same virtual network, otherwise, the packet will be sent to the network node for layer 3 routing.
Please note that this flow is specific to Neutron with Openvswitch plugin. Other plugin may have completely different flow.