VXLAN, or Virtual Extensible LAN, is a recent proposed standard technology from VMware and Cisco. It extends the concept of a VLAN in a manner that scales well for multi-tenant environments, at least in the sense of having a 24 bit LAN identifier, rather than a 12 bit VLAN id. It also allows VXLAN deployment in various locations of a data center, separated by a multicast-capable Layer 3 network.
There’s a lot already written about VXLAN, so I’ll try to stick to the basics and some comments, then refer you to the good articles and blogs that I’ve encountered.
I’ve previously done a quick description of VXLAN as sort of OTV for the 1000v or VMware. They both provide mechanisms to “stretch” a Layer 2 VLAN.
That comparison is not quite fair to OTV, which uses routing to track reachability, and uses ARP caching to reduce broadcast traffic. VXLAN is intended for use within a single datacenter, and therefore can afford to be a bit more promiscuous with BUM (Broadcast, Unknown Unicast, and Multicast) propagation, tunneling them inside multicast across Layer 3. VXLAN can also perhaps support more “pockets” of the VXLAN separated by L3 than might be wise to do with OTV. (I’m thinking BUM radiation to all “pockets” via IPmc within a data center is less nasty than doing something similar across a L3 WAN between a comparable large number of data centers.)
To sum that up, here’s a diagram showing what VXLAN does:
This is discussed in a little more detail below. The orange shows the UDP tunnel between two VXLANs (same segment ID). The blue line shows external access to that VXLAN via the vShield Edge (VSE).
OTV and VXLAN do have similar header formats, which may simplify gatewaying between them in the future.
To me, understanding both technologies basics comes down to the old punch line “how does it know?” (Thermos, hot vs. cold.) In this case, how does an edge device know which peer to tunnel traffic to? In OTV, yes, they’re Edge Devices. In VXLAN, they are a VXLAN Tunnel End Point (VTEP).
In the case of OTV, ISIS is used to track reachability and which other OTV device (Edge Device) a given MAC is reached via. ARP is still used to tie IP to MAC, but is cached. So in OTV, the Edge Device learns a MAC to remote IP association via the OTV routing.
In the case of VXLAN, ARP broadcast is sent as multicast, and when a reply comes back, the VTEP learns the MAC to remote IP association. Subsequent traffic to that MAC address is unicast IP encapsulated — IP multicast is only used for BUM traffic.
For a little more detail (but not overwhelming), see Omar Sultan’s blog article Digging Deeper into VXLAN, Part 1, at http://blogs.cisco.com/datacenter/digging-deeper-into-vxlan/. And sequels (see References below).
Note that each VXLAN is tied to a different IP multicast group. The VTEP of course joins that IPmc group to receive relevant multicasts. Since there are more VXLAN numbers than IPmc groups available, there is potential overlap. The packets contain a VXLAN ID so a receiving VTEP can “tune out” information it doesn’t care about (other VXLANs not locally active). The multicast group is administratively assigned, e.g. configured on a Cisco Nexus 1000v (where the VXLAN is referred to as “segment id”). The draft RFC notes that one might want to use BiDir PIM to handle situations with many sources which are also receivers.
One of the touted benefits of VXLAN is that you can in effect extend VMware VLANs without changing transport infrastructure, no VTP, no requesting VLANs from the network staff. Another way of saying that is that your server VMware admins can go off and do VXLANs without talking to you. It might be better to talk, bearing in mind that if you say “no”, they can go ahead and do it anyway. Educating server admins about BUM radiation versus convenience, also troubleshooting complexity, may be useful.
All this leads me to think about a conversation I recently had about VTP:
Me: “You realize with VTP server active, your VLANs could all vanish in a flash due to a mistake.”
Other person: “Yes, but we need it since it makes adding VLANs easy.”
Me: “And that’s a good thing?”
I regard VLAN sprawl as risky and undesirable. So I of course think VTP is bad, although more for the “poof your datacenter VLANs just vanished” risk. Having a little hassle adding VLANs makes you think about it and plan, also tends to naturally limit where they sprawl to, assuming you do manual VLAN pruning on trunks (which I also think is a best practice — rumor says that when using VTP server, automatic pruning can make things worse when a STP loop hits).
The Outside World
One needs something to interconnect a VXLAN to a VLAN, some sort of gatewaying (bridging or routing) function. Right now, the ways to do that are via VMware vShield Edge or (announced) the ASR 1000v. One suspects the 1000v may do it in the future. Having the virtual CSR bridge between VXLAN and OTV also seems like it would be useful, to give a sane (lower traffic) way to connect a VXLAN to a VXLAN in another datacenter. (I suspect that at this point some readers may be thinking “Or is that extending the madness?”)
One might like a robust (redundant, highly available) gateway between VXLAN and VLAN. VMware’s vShield Edge (VSE) is not currently capable of redundancy. One suspects ASA 1000v will be able to provide a High Availability active / passive or maybe cluster solution fairly soon. One might even want multiple diverse gateways, but then one has added the need for some sort of Spanning Tree protection. I’m not doing to hold my breath on that one, it’s complicated, and just may not be necessary within a single data center. .
In short, gatewaying VXLAN is still in the early stages of evolving.
See also Scott Lowe’s blog about this, at http://blog.scottlowe.org/2011/12/07/revisiting-vxlan-and-layer-3-connectivity/.
Note that for multi-tenant, one might want to have a vApp or AppPod (my term) consisting of many VMs and virtual appliances. And just clone it as a new VXLAN (possibly with distributed components). VSE could then provide edge NAT for the web servers or whatever components need to be publicly reachable. I.e. for virtual datacenter and cloud automation, VXLAN could indeed be rather useful. Having a single NAT point fits well with the above discussion about gateway redundancy.
We haven’t discussed optimal routing and VXLAN (see similar discussions re OTV). My short answer: within a data center, it doesn’t matter that much. If and when multiple gateways are available, then LISP might be another answer — but would be adding complexity where the extra amount of low latency probably doesn’t matter. YMMV.
VMware just acquired Nicira, which apparently supports various kinds of tunnels. That would seem to lend momentum to VXLAN as a protocol, although they may continue to be somewhat agnostic as to tunneling method. See also the Ivan P blogs at http://blog.ioshints.info/2011/10/what-is-nicira-really-up-to.html and http://blog.ioshints.info/2012/07/vmware-buys-nicira-hypervisor-vendor.html.