Part 2: NSX-T routing deep dive - How a stateful service drastically changes routing

Imagem de capa

Quick recap of what we have covered so far:
In Part 1 we looked at two topologies:

We looked at how north bound traffic changed before and after instantiating a T1-SR on our Edge Node cluster.

Now in Part 2 we will investigate what a south bound flow would look like with and without a T1 stateful service, and examine how this will affect routing paths.

We will start by looking at southbound traffic flows when there is no T1 stateful service. Refer to Topology A below.

Topology A

Topology A: T0 in ECMP and a T1 router with no stateful services (T1 connected to T0 but the T1 is NOT associated with an Edge Cluster).

Let’s take a look at the BGP routing tables on the TORs to see how they are forwarding traffic to our T0-SRs across our Edges.

TOR1 BGP Routing table:
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
                r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
                x best-external, a additional-path, c RIB-compressed,
  Origin codes: i - IGP, e - EGP, ? - incomplete
  RPKI validation codes: V valid, I invalid, N Not found
   Network          Next Hop            Metric LocPrf Weight Path
  *>i 0.0.0.0          192.168.1.100            0    100      0 ?
  *m  10.10.0.0/19     10.10.11.11                             0 200 ?  <-- Edge-A T0-SR    
  *>  10.10.11.12                                              0 200 ?  <-- Edge-B T0-SR 

TOR2 BGP Routing table:
 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
                r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
                x best-external, a additional-path, c RIB-compressed,
  Origin codes: i - IGP, e - EGP, ? - incomplete
  RPKI validation codes: V valid, I invalid, N Not found
   Network          Next Hop            Metric LocPrf Weight Path
  *>i 0.0.0.0          192.168.1.100            0    100      0 ?
  *m  10.10.0.0/19     10.10.12.11                            0 200 ?  <-- Edge-A T0-SR
  *>                   10.10.12.12                            0 200 ?  <-- Edge-B T0-SR 

We can see in the routing tables above that the TOR switches are installing multiple paths into their RIB to get to 10.10.0.0/19. Each TOR is peered with a T0-SR instance on Edge-A and Edge-B.

Keep in mind, these T0-SR/DR instances make up a single Tier-0 Gateway. See Figure A below.

Figure A

Now that we have determined that the TOR switches will distribute traffic southbound across its T0-SR peers, let’s take a look at how the T0-SR forwards traffic.

Edge-A T0-SR Forwarding Table
Logical Router
 UUID                                   VRF    LR-ID  Name                              Type
 33aa3f1f-c5cc-4f4f-9f34-37e800b0bbbd   3      8194   SR-T0-carrot                      SERVICE_ROUTER_TIER0
 IPv4 Forwarding Table
 IP Prefix          Gateway IP      Type     UUID                                   Gateway MAC
 0.0.0.0/0          10.10.11.1      route    f3a89727-7799-45ed-8ce3-19d4a0220911   00:50:56:90:27:5f
                    10.10.12.2               d8c6e18b-cdc0-4a86-8133-8ef432ef3f3b   00:50:56:90:ab:34
 10.10.5.0/24       100.64.16.3     route    b8b36c5d-c85e-4cb3-9d63-1f69b97cc397

In the above forwarding table, I have highlighted the particular route we are interested in. This is a route to a segment on a T1 router which does not have a T1-SR. To understand exactly what this particular route is doing, we need the interface information for the T0 and T1. See interfaces below:

Edge-A T0-DR Interface:
 Interface     : b8b36c5d-c85e-4cb3-9d63-1f69b97cc397
     Ifuid         : 395
     Name          : T0-carrot-T1-palpatine-t0_lrp
     Internal name : linked-395
     Mode          : lif
     IP/Mask       : 100.64.16.2/31;fc25:93fe:3ac1:c801::1/64;fe80::50:56ff:fe56:4452/64
     MAC           : 02:50:56:56:44:52
     VNI           : 71695
     LS port       : 8c925627-3de6-4cba-b099-f24d64bd446b
     Urpf-mode     : PORT_CHECK
     Admin         : up
     Op_state      : up
     MTU           : 1500 

Edge-A T1-DR Interface:
  Interface     : a58925eb-6130-4557-9d12-fa8582244971
     Ifuid         : 398
     Name          : T0-carrot-T1-palpatine-t1_lrp
     Mode          : lif
     IP/Mask       : 100.64.16.3/31;fe80::50:56ff:fe56:4455/64;fc25:93fe:3ac1:c801::2/64
     MAC           : 02:50:56:56:44:55
     VNI           : 71695
     LS port       : 6d89b610-87ee-466c-b8d7-c307f6342813
     Urpf-mode     : NONE
     Admin         : up
     Op_state      : up
     MTU           : 1500 

For the particular route we are interested in:

 10.10.5.0/24       100.64.16.3     route    b8b36c5d-c85e-4cb3-9d63-1f69b97cc397 

Based on the forwarding table information, we can see that when the traffic is destined for the 10.10.5.0/24 network, the T0-SR will forward traffic to 100.64.16.3 via Interface b8b36c5d-c85e-4cb3-9d63-1f69b97cc397

This is an interesting behaviour because interface b8b36c5d-c85e-4cb3-9d63-1f69b97cc397 is the downlink on the T0-DR. This means the T0-SR is aware that to get to 100.64.16.x (inter-tier transit network), the traffic should first be forwarded to the T0-DR. This means the traffic would be forwarded out of the T0-SR’s bp-sr0-port across the intra-tier transit link to the bp-dr-port on the T0-DR. Once received, the T0-DR would forward it out of interface b8b36c5d-c85e-4cb3-9d63-1f69b97cc397 to the destination 100.64.16.3 IP. As we can see above this IP belongs to the intra-tier LIF interface of the T1-DR (Interface: a58925eb-6130-4557-9d12-fa8582244971).

Figure B

Now we understand how the traffic is getting to the T1-DR within the Edge Node. Let’s look at how the traffic gets to its final destination.

Edge-A T1-DR Forwarding Table:
 Logical Router
 UUID                                   VRF    LR-ID  Name                              Type
 3e7f938b-8947-4b4a-a5a6-98b8c5dd1a89   9      15371  DR-T1-palpatine                   DISTRIBUTED_ROUTER_TIER1
 IPv4 Forwarding Table
 IP Prefix          Gateway IP      Type     UUID                                   Gateway MAC
 0.0.0.0/0          100.64.16.2     route    a58925eb-6130-4557-9d12-fa8582244971
 10.10.5.0/24                       route    1abf24d6-97af-4d1d-a764-048dd35a7aa1
 10.10.5.1/32                       route    afa06616-5c43-5047-afb8-f41b18eac5bc

Edge-A T1-DR Interfaces:
Logical Router
 UUID                                   VRF    LR-ID  Name                              Type
 3e7f938b-8947-4b4a-a5a6-98b8c5dd1a89   9      15371  DR-T1-palpatine                   DISTRIBUTED_ROUTER_TIER1
 Interfaces
     Interface     : 1abf24d6-97af-4d1d-a764-048dd35a7aa1
     Ifuid         : 408
     Name          : infra-P-Segment-5-dlrp
     Mode          : lif
     IP/Mask       : 10.10.5.1/24
     MAC           : 02:50:56:56:44:52
     VNI           : 71697
     LS port       : c8baa8e5-99b6-4bfd-91a4-387dd9982df7
     Urpf-mode     : STRICT_MODE
     Admin         : up
     Op_state      : up
     MTU           : 1500

We can see in the forwarding table above the T1-DR has a directly connected interface on the destination network (10.10.5.0/24) so when the traffic is received, the T1-DR will switch the traffic onto this network via interface 1abf24d6-97af-4d1d-a764-048dd35a7aa1. The traffic would then be encapsulated and sent out of the Edge Node VMs TEP to the specific Transport Node where the destination VM is running.

Now we understand North to South traffic when there is no stateful service. Let’s look at how a stateful service on a T1 changes the routing behaviour. See Topology B below.

Topology B

Now we have met the following conditions on our T1:

– T1 must be connected to a T0
– T1 must be associated with an Edge Cluster
A T1 Service Router has been created in Active/Standby. The active T1-SR is running on Edge-B and the standby on Edge-A.

Even with the above conditions met, traffic flows from the TOR switches to the T0-SRs across our Edge Nodes and will still route in the same manner described at the start of this post (ECMP distribution across T0-SRs when heading southbound).

Let’s look at what will happen when the traffic comes in via the T0-SR on Edge-B:

Edge-B T0-SR Forwarding Table:
UUID                                   VRF    LR-ID  Name                              Type
 5c57b700-f54c-4182-a638-0b13346dc2ff   1      11266  SR-T0-carrot                      SERVICE_ROUTER_TIER0
 IPv4 Forwarding Table
 IP Prefix          Gateway IP      Type     UUID                                   Gateway MAC
 0.0.0.0/0          10.10.11.1      route    edf37e34-b6ac-4443-b033-7376d81217ad   00:50:56:90:27:5f
                    10.10.12.2               ea488f49-8a9e-42f1-8602-2a001ea48dd1   00:50:56:90:ab:34
 10.10.5.0/24       100.64.16.3     route    b8b36c5d-c85e-4cb3-9d63-1f69b97cc397   02:50:56:56:44:55

Edge-B T0-DR Interface
Interface     : b8b36c5d-c85e-4cb3-9d63-1f69b97cc397
     Ifuid         : 404
     Name          : T0-carrot-T1-palpatine-t0_lrp
     Internal name : linked-404
     Mode          : lif
     IP/Mask       : 100.64.16.2/31;fc25:93fe:3ac1:c801::1/64;fe80::50:56ff:fe56:4452/64
     MAC           : 02:50:56:56:44:52
     VNI           : 71695
     LS port       : c347a17a-0868-42e0-beb0-134e5b3031b4
     Urpf-mode     : PORT_CHECK
     Admin         : up
     Op_state      : up
     MTU           : 1500

The T0-SR’s route to 10.10.5.0/24 goes via 100.64.16.3, which is an IP on the inter-tier transit network between the T0-DR and the T1-SR. To get there, it will first forward the traffic out of its intra-tier bp-sr0-port to the T0-DRs bp-dr-port. See information for these two interfaces below:

Edge-B T0-SR bp-sr-port:
Interface     : 55efebf5-a9fc-4216-b346-5f685178bbea
     Ifuid         : 302
     Name          : bp-sr1-port
     Mode          : lif
     IP/Mask       : 169.254.0.4/25;169.254.0.3/25;fe80::50:56ff:fe56:5302/64;fe80::50:56ff:fe56:5301/64
     MAC           : 02:50:56:56:53:01
     VNI           : 71687
     LS port       : 257e8903-6126-4290-9dc5-41aa9a04de3e
     Urpf-mode     : NONE
     Admin         : up
     Op_state      : up
     MTU           : 1500

Edge-B T0-DR bp-dr-port:
Interface     : 105bca47-9671-4079-89cb-e00936764916
     Ifuid         : 305
     Name          : bp-dr-port
     Mode          : lif
     IP/Mask       : 169.254.0.1/25;fe80::50:56ff:fe56:4452/64
     MAC           : 02:50:56:56:44:52
     VNI           : 71687
     LS port       : 8d7dd4fc-b670-4448-a01f-2cd5c66a1ed1
     Urpf-mode     : PORT_CHECK
     Admin         : up
     Op_state      : up
     MTU           : 1500

Once the traffic is received from the T0-SR, the T0-DR will forward the traffic out of its interface b8b36c5d-c85e-4cb3-9d63-1f69b97cc397 over the inter-tier transit network to the T1-SRs LIF IP interface: 100.64.16.3
We can see the interface information for the destination T1-SR below.

Edge-B T1-SR Interface: 
Logical Router
 UUID                                   VRF    LR-ID  Name                              Type
 1e6a02ea-08ec-47a8-aeb4-3ee61b6b40b3   14     15373  SR-T1-palpatine                   SERVICE_ROUTER_TIER1
 Interfaces
     Interface     : a58925eb-6130-4557-9d12-fa8582244971
     Ifuid         : 407
     Name          : T0-carrot-T1-palpatine-t1_lrp
     Mode          : lif
     IP/Mask       : 100.64.16.3/31;fe80::50:56ff:fe56:4455/64;fc25:93fe:3ac1:c801::2/64
     MAC           : 02:50:56:56:44:55
     VNI           : 71695
     LS port       : 6d89b610-87ee-466c-b8d7-c307f6342813
     Urpf-mode     : NONE
     Admin         : up
     Op_state      : up
     MTU           : 1500 

The traffic has been received on the T1-SR’s a58925eb-6130-4557-9d12-fa8582244971 interface. Let’s look at the T1-SR’s forwarding table below to see how it will handle the next hop:

Edge-B T1-SR Forwarding Table:
Logical Router
 UUID                                   VRF    LR-ID  Name                              Type
 1e6a02ea-08ec-47a8-aeb4-3ee61b6b40b3   14     15373  SR-T1-palpatine                   SERVICE_ROUTER_TIER1
 IPv4 Forwarding Table
 IP Prefix          Gateway IP      Type     UUID                                   Gateway MAC
 0.0.0.0/0          100.64.16.2     route    a58925eb-6130-4557-9d12-fa8582244971   02:50:56:56:44:52
 10.10.5.0/24                       route    1abf24d6-97af-4d1d-a764-048dd35a7aa1
 10.10.5.1/32                       route    afa06616-5c43-5047-afb8-f41b18eac5bc
 100.64.16.2/31                     route    a58925eb-6130-4557-9d12-fa8582244971
 100.64.16.3/32                     route    54eb6c65-1d20-5930-a1a0-e168a47873b6
 127.0.0.1/32                       route    588f2add-9d3e-4555-9739-69261609fb50
 169.254.0.0/28                     route    e17fef2f-e383-4303-b619-beaed7bfe946
 169.254.0.1/32                     route    afa06616-5c43-5047-afb8-f41b18eac5bc
 169.254.0.2/32                     route    54eb6c65-1d20-5930-a1a0-e168a47873b6

Edge-B T1-DR Interface
Interface     : 1abf24d6-97af-4d1d-a764-048dd35a7aa1
     Ifuid         : 417
     Name          : infra-P-Segment-5-dlrp
     Mode          : lif
     IP/Mask       : 10.10.5.1/24
     MAC           : 02:50:56:56:44:52
     VNI           : 71697
     LS port       : c8baa8e5-99b6-4bfd-91a4-387dd9982df7
     Urpf-mode     : STRICT_MODE
     Admin         : up
     Op_state      : up
     MTU           : 1500

Edge-B T1-SR Intra Tier Transit Interface:
Interface     : e17fef2f-e383-4303-b619-beaed7bfe946
     Ifuid         : 443
     Name          : bp-sr0-port
     Mode          : lif
     IP/Mask       : 169.254.0.2/28;fe80::50:56ff:fe56:5300/64

Edge-B T1-DR Intra Tier Transit Interface
Interface     : aca94d8e-684e-44cc-bc59-ac612a74a400
     Ifuid         : 408
     Name          : bp-dr-port
     Mode          : lif
     IP/Mask       : 169.254.0.1/28;fe80::50:56ff:fe56:4452/64

The T1-SR will forward the traffic out of its local bp-sr0-port across the intra-tier transit network, and the T1-DR will receive the traffic on its bp-dr-port. The T1-DR will switch this traffic onto its LIF Interface: 1abf24d6-97af-4d1d-a764-048dd35a7aa1 (directly connected interface on the 10.10.5.0/24 network). The traffic is then encapsulated and sent across the appropriate Geneve Tunnel to the specific transport node where the destination VM resides. See Topology B - Path B below for visualisation.

Transport Nodes (Including Edge VMs) hold a MAC table per segment. When traffic is to be forwarded to another Transport Node a MAC table lookup will occur to see which tunnel the traffic will be forwarded across. I dumped an example MAC table below:

Edge-B Logical Switch "M-Seg-2" MAC-Table:
     MAC         : 00:50:56:90:3d:05  <--- MAC address of the destination virtual machine
         Tunnel      : a426178f-be9a-5c45-abe3-0840f6fd6205 <-- Unique Tunnel ID
         IFUID       : 350
         LOCAL       : 10.10.9.60 <--- Edge-B TEP IP
         REMOTE      : 10.10.9.56 <--- Destination Transport Node TEP IP
         ENCAP       : GENEVE

Topology B - Path B

Now we understand what will happen when southbound traffic is received via the Edge Node with the active T1-SR instance. Let’s look at how this changes when southbound traffic is received on Edge-A while the active T1-SR instance resides on Edge-B.

We will start by looking at the forwarding table of the T0-SR on Edge-A:

Edge-A T0-SR
Logical Router
 UUID                                   VRF    LR-ID  Name                              Type
 33aa3f1f-c5cc-4f4f-9f34-37e800b0bbbd   3      8194   SR-T0-carrot                      SERVICE_ROUTER_TIER0
 IPv4 Forwarding Table
 IP Prefix          Gateway IP      Type     UUID                                   Gateway MAC
 0.0.0.0/0          10.10.11.1      route    f3a89727-7799-45ed-8ce3-19d4a0220911   00:50:56:90:27:5f
                    10.10.12.2               d8c6e18b-cdc0-4a86-8133-8ef432ef3f3b   00:50:56:90:ab:34
 10.10.5.0/24       100.64.16.3     route    b8b36c5d-c85e-4cb3-9d63-1f69b97cc397

In the forwarding table above, the T0-SR is forwarding traffic via interface b8b36c5d-c85e-4cb3-9d63-1f69b97cc397 to IP 100.64.16.3 to get to the 10.10.5.0/24 network. Let’s take a look at the interfaces below:

Edge-A T0-DR Interface: 
Interface     : b8b36c5d-c85e-4cb3-9d63-1f69b97cc397
     Ifuid         : 395
     Name          : T0-carrot-T1-palpatine-t0_lrp
     Internal name : linked-395
     Mode          : lif
     IP/Mask       : 100.64.16.2/31;fc25:93fe:3ac1:c801::1/64;fe80::50:56ff:fe56:4452/64
     MAC           : 02:50:56:56:44:52
     VNI           : 71695
     LS port       : c347a17a-0868-42e0-beb0-134e5b3031b4
     Urpf-mode     : PORT_CHECK
     Admin         : up
     Op_state      : up
     MTU           : 1500

Edge-B T1-SR (Active)
Interface     : a58925eb-6130-4557-9d12-fa8582244971
     Ifuid         : 407
     Name          : T0-carrot-T1-palpatine-t1_lrp
     Mode          : lif
     IP/Mask       : 100.64.16.3/31;fe80::50:56ff:fe56:4455/64;fc25:93fe:3ac1:c801::2/64
     MAC           : 02:50:56:56:44:55
     VNI           : 71695
     LS port       : 6d89b610-87ee-466c-b8d7-c307f6342813
     Urpf-mode     : NONE
     Admin         : up
     Op_state      : up
     MTU           : 1500

From the information above, we know that the traffic is being forwarded from the T0-SR out of the T0-DR interface b8b36c5d-c85e-4cb3-9d63-1f69b97cc397 to get to IP 100.64.16.3/31 (Gateway IP for this particular route). To do this, the T0-SR would forward the traffic out of its bp-sr0-port across the intra-tier transit network and the T0-DR receives the traffic on it’s bp-dr-port.

Once received, traffic is then forwarded out of the interface mentioned above (b8b36c5d-c85e-4cb3-9d63-1f69b97cc397), across the inter-tier transit network to the active T1-SR on Edge-B with LIF interface IP: 100.64.16.3.

Let’s take a look at the T1-SR forwarding table below to see how it will handle the traffic it has just received:

Edge-B T1-SR (Active)
Logical Router
 UUID                                   VRF    LR-ID  Name                              Type
 1e6a02ea-08ec-47a8-aeb4-3ee61b6b40b3   14     15373  SR-T1-palpatine                   SERVICE_ROUTER_TIER1
 IPv4 Forwarding Table
 IP Prefix          Gateway IP      Type     UUID                                   Gateway MAC
 0.0.0.0/0          100.64.16.2     route    a58925eb-6130-4557-9d12-fa8582244971
 10.10.5.0/24                       route    1abf24d6-97af-4d1d-a764-048dd35a7aa1
 10.10.5.1/32                       route    afa06616-5c43-5047-afb8-f41b18eac5bc

In the forwarding table on the T1-SR it has a route to 10.10.5.0/24 via interface 1abf24d6-97af-4d1d-a764-048dd35a7aa1. If we look at the interface details below we can see that this is the LIF interface on the T1-DR which connects to the 10.10.5.0/24 segment where our destination virtual machine resides.

Edge-B T1-DR
Interface     : 1abf24d6-97af-4d1d-a764-048dd35a7aa1
     Ifuid         : 417
     Name          : infra-P-Segment-5-dlrp
     Mode          : lif
     IP/Mask       : 10.10.5.1/24
     MAC           : 02:50:56:56:44:52
     VNI           : 71697
     LS port       : c8baa8e5-99b6-4bfd-91a4-387dd9982df7
     Urpf-mode     : STRICT_MODE
     Admin         : up
     Op_state      : up
     MTU           : 1500

The T1-SR will forward the traffic out of its backplane interface (bp-sr0-port) to the T1-DR’s bp-dr-port. T1-DR will then switch the traffic onto the 10.10.5.0/24 network as it has a directly connected interface there.

The MAC table lookup occurs, and determines where traffic is to be forwarded and which Geneve Tunnel the traffic will be sent across to the destination Transport Node. Below, Diagram D visualises the traffic flow we have just walked through.

Topology B - Path A

TL;DR Summary of our investigation in Part 2:

Topology A - Path A and B: Southbound flows were distributed from the TOR switches across the T0-SR instances on the Edge Nodes. The routing for the T0 and T1 occurred locally within the edge on which the traffic was received. Traffic was then forwarded to the Transport Node where the destination VM was running. Visualisation below.

Topology A - Path A and B

Topology B - Path B: Southbound traffic was received on the Edge Node where the active T1-SR was running (Edge-B). The T0 and T1 routing all occurred locally within the edge where the traffic was received. It was then forwarded to the transport node where the destination VM was running. Visualisation below.

Topology B - Path B

Topology B - Path A: Southbound traffic was received on the Edge Node where the standby T1-SR was running (Edge-A). The T0 performed the appropriate routing and the traffic was then forwarded to the active T1-SR running on Edge-B. A route lookup occurred, it was forwarded to the T1-DR, switched onto the appropriate segment and then forwarded to the Transport Node where the destination VM was running. Visualisation below.

Topology B - Path A

In Part 3 we will take a look at an alternate design where T1-SRs are in use at scale.