Practical SDN: L3 Forwarding in NSX, DFA, and ACI

Author
Peter Welcher
Architect, Operations Technical Advisor

How does L3 forwarding work in NSX, DFA, and ACI? That’s the topic for this blog, blog #3 in a series. The series attempts to contrast the various behaviors of NSX, DFA, and ACI. So let’s jump right in and see what’s happening at Layer 3. (And if you’re still reading, the last two blogs took a while to write and ended up rather long — I’m trying to avoid those pitfalls here.)

http://upload.wikimedia.org/wikipedia/commons/2/2c/Router_mark.PNG

(Source: Wikipedia. Creative Commons license) (Graphic by Tosaka)

Prior blogs in this series:

Common Elements (NSX, DFA, ACI)

As noted in the prior blog, all of these SDN 2.0 technologies attempt to distribute the routing, to minimize latency and distribute workload.

Each of NSX, DFA, and ACI virtualizes or distributes the default gateway, so that the local hypervisor or the Leaf switch can act as a virtual L3 default gateway. No more HSRP, VRRP, or GLBP! Anycast FHRP was an interesting idea by Cisco, this approach ought to scale even better!

Exercise for the reader: It seems the default gateway ARP response would have to have a common virtual MAC address (VMAC), to support vMotion between hypervisors on different Leaf switches. Verify this and post a comment with your findings.

Layer 3 Forwarding and NSX

NSX has both distributed and edge routing functionality.

NSX for Multi-Hypervisor supports routing between VXLANs via L3 gateway nodes (active/passive for high availability). Each logical router is like a VRF, its own routing domain. Only can be connected to any VXLAN. One uplink to the physical network is supported. NSX for Multi-Hypervisor also supports distributed logical routers with connected routes, static routes, and default to the L3 gateway.

NSX for vSphere supports distributed routing between VXLANs and VLANs, i.e. the physical world too. Outbound traffic to the physical world is distributed. Inbound goes via the Designated Instance (DI).

The following diagram shows how L3 forwarding between VMs works.

20140124-fig01

 

Ivan Pepelnjak (@ioshints) covers this in a video, also in his NSX Architecture slides — see his “Layer 3 Gateways“, at  http://demo.ipspace.net/get/4.2%20-%20Layer-3%20Gateways.mp4. I’m relying on his presentation as the other published sources I have are less clear. It has a great walkthrough of the L2 tunneling encapsulation process. This is more or less the same, except that the distributed router does a forwarding lookup and header rewrite before tunneling as he describes.

Brad Hedlund (@bradhedlund) walks through the Designated Instance (DI) ARP and routing behavior for NSX for vSphere distributed routing in his blog at http://bradhedlund.com/2013/11/20/distributed-virtual-and-physical-routing-in-vmware-nsx-for-vsphere/. With a diagram, even! (Thanks, Brad!) That covers distributed routing between a VM and a physical device.

There are restrictions on running L2 and L3 on the same gateway (at least of NSX for Multi-Hypervisor). I’ve heard that rule compared to the rules for Cisco OTV:  a VDC with a VLAN’s SVI in it can’t do OTV, so we use a dedicated OTV VDC. I imagine the rule is to keep things simpler. I recall the logic of Cisco IOS IRB (Integrated Routing and Bridging) always bothered people.

Anyway, see Ivan’s VMware NSX Gateway Questions. I’m going to duck the topic of other rules and constraints, since they may evolve over time, and would be Too Much Information. What I would like to see: more discussion around how one can and cannot use NSX for vSphere L2 and L3 gateways, and typical use cases. Which VMware will likely be publishing over time as they document NSX for a broader audience.

NSX for vSphere does or will run BGP and OSPF, using the controller as proxy for the distributed virtual routing functionality (i.e. one OSPF neighbor to the DI edge routing function, not many). The typical use case may be BGP to the physical world, and OSPF internally.

Summary: in NSX, you can assemble somewhat basic logical switches and routers. The most evolution appears to be happening in NSX for vSphere. The distributed logical router is neat stuff for the virtual world. L2 gatewaying appears useful for P2V migration. L3 edge gatewaying appears simpler for creating a virtual application pod or pods and routing from the physical to the virtual side of things.

Routing from a VM to an external device via the edge gateway tunnels to the edge router, which then acts like a physical router, forwarding to the routing next hop.

Layer 3 Forwarding in DFA

With DFA and ACI, L3 forwarding isn’t that much different than the L2 forwarding in the prior blog.

  • When a host ARPs for a VLAN default gateway, the Leaf switch intercepts it and returns a virtual MAC address for the VLAN default gateway.
  • The host L2 encapsulates with DMAC = vMAC, and sends the frame out. The Leaf switch sees it.
  • If the destination IP is known, the Leaf switch tunnels the packet using FabricPath to the Leaf switch the destination IP is connected to. The MAC header inside uses the MAC addresses of the two Leaf switches.
  • Just as with L2 FabricPath, the receiving switch de-tunnels the frame. It also removes the L2 header inside.
  • It then does L3 lookup and forwarding with SMAC = virtual default gateway MAC and DMAC = end system MAC.

All of this is VRF-aware.

Diagram:

20140124-fig02

Layer 3 Forwarding in ACI

The ACI materials make a point that IP address has no location requirement, it can be anywhere. I suspect that ACI is really tracking location of {IP, tenant} type information, to allow for different tenants with duplicated IP addresses. It’s not clear what the capabilities and constraints are when {IP1, tenant1} wants to talk to {IP2, tenant2}. How might such routing work?  I would imagine that a global interconnect “tenant” would be needed, just as we currently use the Internet to talk between entities. Some stock firm private back office interconnects are moving to assigned public address space used privately, because the interconnections have become twisty little mazes of static routes and NAT.

If a host in tenant1 wishes to reach say 10.1.1.1 which is in use by an attached host for tenant1 but also for tenant2, there would need to be some way to dis-ambiguate. In the physical world, we would probably NAT to public address space, at least for connecting two entities over the Internet. Sometimes for VPN connections. I’ll have to chalk that up as a detail TBL (to be learned).

Diagram:

20140124-fig03

Other Relevant Prior Blogs

Twitter: @pjwelcher

Disclosure Statement

 ccie_15years_med   CiscoChampion200PX

Leave a Reply

 

Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.

 

Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.

 

John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.