This article continues and builds upon my prior blog Understanding Layer 2 over Layer 3 (Part 1), which sets the necessary context and background. (And provides a bunch of good links!)
Let’s say we’re doing Layer 2 (L2) over a routed network, to limit the size of Layer 2 failure domains. We are probably doing Layer 2 over Layer 3 transport because:
- We’re doing EoMPLS to migrate servers (physical or virtual) past the routed core in a moderately large data center.
- We’re doing EoMPLS, VPLS, or OTV for Data Center Interconnect (DCI) between two or more data centers, over a routed WAN or MAN network
The following diagram tries to suggest this. The colored arrow is intended to indicate Layer 2 connectivity over the Layer 3 routed network (LAN, MAN, or WAN) in the middle, possibly using OTV (Overlay Transport Virtualization) or EoMPLS (Ethernet over MPLS) as the underlying technology for the L2 connection.
As soon as you do something like this, you have a trunk or VLAN between the two sites. If it is a trunk, we can understand the implications by analyzing a representative VLAN carried on the trunk.
From the Layer 3 perspective, that VLAN is a subnet, one that happens to “be” in two places. I like to think of this as VLANs on switches at the two ends, connected by a very long Ethernet cable, one which happens to be virtual. The slanted lines at the bottom of the diagram are supposed to suggest the virtual cabling connecting the two parts of the VLAN.
The L2 over L3 technology would handle any within-VLAN traffic. (For EoMPLS, this probably means some use of “loopback cabling”, to allow MAC switching logic to be used.)
From the routing perspective, the two locations (four switches shown) happen to all be connected to one (logical, virtual) piece of wire, the VLAN and subnet. Assuming you put IP addresses on the VLAN interfaces at both ends, and advertised them in your routing protocol. .
Let’s examine what happens if you do not do so. So let us suppose that only the left side switches have IP address(es) on the VLAN interfaces in question. Then you would get a traffic pattern as in the following diagram. That is, all traffic to the VLAN would go via the left router(s). Even the router at middle right might well route across the L3 network to get to the VLAN. That’s not very attractive if the L3 network is a WAN!
Let’s assume the switches at the left side also are doing a FHRP (First Hop Routing Protocol): HSRP, VRRP, or GLBP. Then the servers on the VLAN, from both sites, would route outbound traffic via their default gateway, as indicated by the blue arrows. Also somewhat sub-optimal for servers on the right side bottom of the diagram.
Well, that makes it fairly clear we want both sides advertising connectivity to the subnet on the VLAN.
In that case, inbound traffic will go to the closest connection to the subnet, which is suggested in the following diagram.
That may or may not be optimal. If you’re doing clustering or hitting a load balancer that (somehow) knows which servers are actually at the site, that’s good. If the inbound traffic is going to a server that happens to be at the other site, then the traffic gets to go across the L2 connection (EoMPLS, OTV, etc.), incurring more delay and chances for packet loss. If you’re doing VMotion with something like vmware DRS (Distributed Resource Scheduler) for vmware High Availability (HA), there is no way for routing to align with which site the Virtual Machine (VM) in quesiton is at. (Some sort of advertisement of a /32 host route would be rather unscalable.) The best routing can do is dump the packet into the VLAN. Then your L2 technology (EoMPLS with loopback cabling, VPLS, or OTV) would have to do MAC-based location-tracking, and deliver the Ethernet frame at L2 to the right site.
If you’re doing DCI for multiple sites, the same applies. Routing gets inbound packets to some site with the subnet present. It is then up to your L2 technology to get the frame the rest of the way to the actual server. This is inherent in routing, which attains scalability by lumping together or aggregating information, to track subnets but not individual hosts.
For outbound traffic, the issue is one of “where is the default gateway”? If the four switches shown are doing HSRP (or VRRP), then traffic will still go outbound via the active HSRP device. With GLBP, outbound traffic could go either way, rather randomly. So all the First Hop Routing Protocols (FHRPs) can be sub-optimal in this setting.
You could run two FHRP groups, one with active primary on the left, the other on the right. And have the servers configured with the appropriate Virtual IP for the site they are located at. That doesn’t work with VMotion. If a VM moves, its address and default gateway remains unchanged. If vmware DRS has a way to make a server prefer one side to the other, then occasional sub-optimality might be acceptable, when loading or failover caused a VM to VMotion to “the wrong side”. I don’t know right now of a way to do that, however. If you are manually moving servers or VMs between the two sides, then sub-optimal traffic outbound might be OK. Or fixing the default gateway on the fly. (Sounds like an operational pain versus sub-optimal traffic situation.)
I have read that OTV can filter FHRP traffic. The implication is that both sides can run a FHRP with the same virtual IP. In that case, a server or VM would auto-magically find the local active HSRP router, and outbound traffic would be optimally handled. Neat stuff! I suspect one might cleverly try to do something like that with one of the other L2 over L3 approaches. Testing that is a lab exercise for the reader. (Please do append a comment with what you find out!)
There is one other factor to consider: how the SAN and SAN extension might interact with all this. Performance can degrade when your server or VM is at one site, and the NAS or SAN is at the other, due to added per-packet latency. And we all know, latency (and packet loss) impact TCP throughput. See my prior blog, TCP/IP Performance Factors.
I’ve seen this in the field, and read about it in the Cisco documents about geoclusters. But it’s a complex topic, so we’ll save it for another time, another blog.
Coming soon: I hope to explain how all this might be relevant in a server relocation and re-addressing setting, with vmware / virtualization P2V (physical to virtual) conversion thrown into the mix.
Cisco Data Center Interconnect (DCI) Links
Blog by Craig Huitema: http://blogs.cisco.com/datacenter/comments/data_center_interconnect/
Cisco page on DCI (with more good links): http://www.cisco.com/en/US/netsol/ns975/index.html
Cisco article on L2 extension: http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps708/white_paper_c11_493718.html
DCI Design and Implementation guide: http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns949/ns304/ns975/data_center_interconnect_design_guide.pdf
Cisco DCI and Geo-Cluster Links
Data Center High-Availability Clusters Design Guide: http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/HA_Clusters/HA_Clusters.html (contains a geoclusters chapter)
2 responses to “Understanding Layer 2 over Layer 3 (Part 2)”
Interesting link, thanks Dave! (And I of course greatly appreciate anybody who says good things about my blogs and articles — will write for praise ).
A blog will follow with details — Cisco has upgrades to GSS / ACE that’ll do something similar. It looks like it all hooks into vmware too. Which one is smoother, dunno. The key is to have different per-Data Center VIP’s, and swap so that DNS resolves to the one appropriate for the Data Center the VM is current living in.
F5 pushes their WAN de-dupe and compression as allowing you to do Storage VMotion as well, addressing my concern about VM at one site, SAN / storage or DB front end at the other (to some extent). [Note: you have to do the server VMotion and the Storage VMotion as two separate steps, which might be automated….]
See the new posting at [url]http://www.netcraftsmen.net/resources/blogs/cisco-overlay-transport-virtualization-otv.html[/url].