New Nexus 9K Items
VXLAN, or Virtual Extensible LAN, is a recent proposed standard technology from VMware and Cisco. It extends the concept of a VLAN in a manner that scales well for multi-tenant environments, at least in the sense of having a 24 bit LAN identifier, rather than a 12 bit VLAN id. It also allows VXLAN deployment in various locations of a data center, separated by a multicast-capable Layer 3 network.
There’s a lot already written about VXLAN, so I’ll try to stick to the basics and some comments, then refer you to the good articles and blogs that I’ve encountered.
I’ve previously done a quick description of VXLAN as sort of OTV for the 1000v or VMware. They both provide mechanisms to “stretch” a Layer 2 VLAN.
That comparison is not quite fair to OTV, which uses routing to track reachability, and uses ARP caching to reduce broadcast traffic. VXLAN is intended for use within a single datacenter, and therefore can afford to be a bit more promiscuous with BUM (Broadcast, Unknown Unicast, and Multicast) propagation, tunneling them inside multicast across Layer 3. VXLAN can also perhaps support more “pockets” of the VXLAN separated by L3 than might be wise to do with OTV. (I’m thinking BUM radiation to all “pockets” via IPmc within a data center is less nasty than doing something similar across a L3 WAN between a comparable large number of data centers.)
To sum that up, here’s a diagram showing what VXLAN does:
This is discussed in a little more detail below. The orange shows the UDP tunnel between two VXLANs (same segment ID). The blue line shows external access to that VXLAN via the vShield Edge (VSE).
OTV and VXLAN do have similar header formats, which may simplify gatewaying between them in the future.
To me, understanding both technologies basics comes down to the old punch line “how does it know?” (Thermos, hot vs. cold.) In this case, how does an edge device know which peer to tunnel traffic to? In OTV, yes, they’re Edge Devices. In VXLAN, they are a VXLAN Tunnel End Point (VTEP).
In the case of OTV, ISIS is used to track reachability and which other OTV device (Edge Device) a given MAC is reached via. ARP is still used to tie IP to MAC, but is cached. So in OTV, the Edge Device learns a MAC to remote IP association via the OTV routing.
In the case of VXLAN, ARP broadcast is sent as multicast, and when a reply comes back, the VTEP learns the MAC to remote IP association. Subsequent traffic to that MAC address is unicast IP encapsulated — IP multicast is only used for BUM traffic.
For a little more detail (but not overwhelming), see Omar Sultan’s blog article Digging Deeper into VXLAN, Part 1, at http://blogs.cisco.com/datacenter/digging-deeper-into-vxlan/. And sequels (see References below).
Note that each VXLAN is tied to a different IP multicast group. The VTEP of course joins that IPmc group to receive relevant multicasts. Since there are more VXLAN numbers than IPmc groups available, there is potential overlap. The packets contain a VXLAN ID so a receiving VTEP can “tune out” information it doesn’t care about (other VXLANs not locally active). The multicast group is administratively assigned, e.g. configured on a Cisco Nexus 1000v (where the VXLAN is referred to as “segment id”). The draft RFC notes that one might want to use BiDir PIM to handle situations with many sources which are also receivers.
One of the touted benefits of VXLAN is that you can in effect extend VMware VLANs without changing transport infrastructure, no VTP, no requesting VLANs from the network staff. Another way of saying that is that your server VMware admins can go off and do VXLANs without talking to you. It might be better to talk, bearing in mind that if you say “no”, they can go ahead and do it anyway. Educating server admins about BUM radiation versus convenience, also troubleshooting complexity, may be useful.
All this leads me to think about a conversation I recently had about VTP:
Me: “You realize with VTP server active, your VLANs could all vanish in a flash due to a mistake.”
Other person: “Yes, but we need it since it makes adding VLANs easy.”
Me: “And that’s a good thing?”
I regard VLAN sprawl as risky and undesirable. So I of course think VTP is bad, although more for the “poof your datacenter VLANs just vanished” risk. Having a little hassle adding VLANs makes you think about it and plan, also tends to naturally limit where they sprawl to, assuming you do manual VLAN pruning on trunks (which I also think is a best practice — rumor says that when using VTP server, automatic pruning can make things worse when a STP loop hits).
One needs something to interconnect a VXLAN to a VLAN, some sort of gatewaying (bridging or routing) function. Right now, the ways to do that are via VMware vShield Edge or (announced) the ASR 1000v. One suspects the 1000v may do it in the future. Having the virtual CSR bridge between VXLAN and OTV also seems like it would be useful, to give a sane (lower traffic) way to connect a VXLAN to a VXLAN in another datacenter. (I suspect that at this point some readers may be thinking “Or is that extending the madness?”)
One might like a robust (redundant, highly available) gateway between VXLAN and VLAN. VMware’s vShield Edge (VSE) is not currently capable of redundancy. One suspects ASA 1000v will be able to provide a High Availability active / passive or maybe cluster solution fairly soon. One might even want multiple diverse gateways, but then one has added the need for some sort of Spanning Tree protection. I’m not doing to hold my breath on that one, it’s complicated, and just may not be necessary within a single data center. .
In short, gatewaying VXLAN is still in the early stages of evolving.
See also Scott Lowe’s blog about this, at http://blog.scottlowe.org/2011/12/07/revisiting-vxlan-and-layer-3-connectivity/.
Note that for multi-tenant, one might want to have a vApp or AppPod (my term) consisting of many VMs and virtual appliances. And just clone it as a new VXLAN (possibly with distributed components). VSE could then provide edge NAT for the web servers or whatever components need to be publicly reachable. I.e. for virtual datacenter and cloud automation, VXLAN could indeed be rather useful. Having a single NAT point fits well with the above discussion about gateway redundancy.
We haven’t discussed optimal routing and VXLAN (see similar discussions re OTV). My short answer: within a data center, it doesn’t matter that much. If and when multiple gateways are available, then LISP might be another answer — but would be adding complexity where the extra amount of low latency probably doesn’t matter. YMMV.
VMware just acquired Nicira, which apparently supports various kinds of tunnels. That would seem to lend momentum to VXLAN as a protocol, although they may continue to be somewhat agnostic as to tunneling method. See also the Ivan P blogs at http://blog.ioshints.info/2011/10/what-is-nicira-really-up-to.html and http://blog.ioshints.info/2012/07/vmware-buys-nicira-hypervisor-vendor.html.
Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.
Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.
John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services. Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.
He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.