Understanding Layer 2 over Layer 3 (Part 2)

Peter Welcher
Architect, Operations Technical Advisor

This article continues and builds upon my prior blog Understanding Layer 2 over Layer 3 (Part 1), which sets the necessary context and background. (And provides a bunch of good links!)

Let’s say we’re doing Layer 2 (L2) over a routed network, to limit the size of Layer 2 failure domains. We are probably doing Layer 2 over Layer 3 transport because:

  1. We’re doing EoMPLS to migrate servers (physical or virtual) past the routed core in a moderately large data center.
  2. We’re doing EoMPLS, VPLS, or OTV for Data Center Interconnect (DCI) between two or more data centers, over a routed WAN or MAN network

The following diagram tries to suggest this. The colored arrow is intended to indicate Layer 2 connectivity over the Layer 3 routed network (LAN, MAN, or WAN) in the middle, possibly using OTV (Overlay Transport Virtualization) or EoMPLS (Ethernet over MPLS) as the underlying technology for the L2 connection.

As soon as you do something like this, you have a trunk or VLAN between the two sites. If it is a trunk, we can understand the implications by analyzing a representative VLAN carried on the trunk.

From the Layer 3 perspective, that VLAN is a subnet, one that happens to “be” in two places. I like to think of this as VLANs on switches at the two ends, connected by a very long Ethernet cable, one which happens to be virtual. The slanted lines at the bottom of the diagram are supposed to suggest the virtual cabling connecting the two parts of the VLAN.

The L2 over L3 technology would handle any within-VLAN traffic. (For EoMPLS, this probably means some use of “loopback cabling”, to allow MAC switching logic to be used.)

From the routing perspective, the two locations (four switches shown) happen to all be connected to one (logical, virtual) piece of wire, the VLAN and subnet. Assuming you put IP addresses on the VLAN interfaces at both ends, and advertised them in your routing protocol. .

Let’s examine what happens if you do not do so. So let us suppose that only the left side switches have IP address(es) on the VLAN interfaces in question. Then you would get a traffic pattern as in the following diagram. That is, all traffic to the VLAN would go via the left router(s). Even the router at middle right might well route across the L3 network to get to the VLAN. That’s not very attractive if the L3 network is a WAN!

Let’s assume the switches at the left side also are doing a FHRP (First Hop Routing Protocol): HSRP, VRRP, or GLBP. Then the servers on the VLAN, from both sites, would route outbound traffic via their default gateway, as indicated by the blue arrows. Also somewhat sub-optimal for servers on the right side bottom of the diagram.

Well, that makes it fairly clear we want both sides advertising connectivity to the subnet on the VLAN.

In that case, inbound traffic will go to the closest connection to the subnet, which is suggested in the following diagram.

That may or may not be optimal. If you’re doing clustering or hitting a load balancer that (somehow) knows which servers are actually at the site, that’s good. If the inbound traffic is going to a server that happens to be at the other site, then the traffic gets to go across the L2 connection (EoMPLS, OTV, etc.), incurring more delay and chances for packet loss. If you’re doing VMotion with something like vmware DRS (Distributed Resource Scheduler) for vmware High Availability (HA), there is no way for routing to align with which site the Virtual Machine (VM) in quesiton is at. (Some sort of advertisement of a /32 host route would be rather unscalable.) The best routing can do is dump the packet into the VLAN. Then your L2 technology (EoMPLS with loopback cabling, VPLS, or OTV) would have to do MAC-based location-tracking, and deliver the Ethernet frame at L2 to the right site.

If you’re doing DCI for multiple sites, the same applies. Routing gets inbound packets to some site with the subnet present. It is then up to your L2 technology to get the frame the rest of the way to the actual server. This is inherent in routing, which attains scalability by lumping together or aggregating information, to track subnets but not individual hosts.

For outbound traffic, the issue is one of “where is the default gateway”? If the four switches shown are doing HSRP (or VRRP), then traffic will still go outbound via the active HSRP device. With GLBP, outbound traffic could go either way, rather randomly. So all the First Hop Routing Protocols (FHRPs) can be sub-optimal in this setting.

You could run two FHRP groups, one with active primary on the left, the other on the right. And have the servers configured with the appropriate Virtual IP for the site they are located at. That doesn’t work with VMotion. If a VM moves, its address and default gateway remains unchanged. If vmware DRS has a way to make a server prefer one side to the other, then occasional sub-optimality might be acceptable, when loading or failover caused a VM to VMotion to “the wrong side”. I don’t know right now of a way to do that, however. If you are manually moving servers or VMs between the two sides, then sub-optimal traffic outbound might be OK. Or fixing the default gateway on the fly. (Sounds like an operational pain versus sub-optimal traffic situation.)

I have read that OTV can filter FHRP traffic. The implication is that both sides can run a FHRP with the same virtual IP. In that case, a server or VM would auto-magically find the local active HSRP router, and outbound traffic would be optimally handled. Neat stuff! I suspect one might cleverly try to do something like that with one of the other L2 over L3 approaches. Testing that is a lab exercise for the reader. (Please do append a comment with what you find out!)

There is one other factor to consider: how the SAN and SAN extension might interact with all this. Performance can degrade when your server or VM is at one site, and the NAS or SAN is at the other, due to added per-packet latency. And we all know, latency (and packet loss) impact TCP throughput. See my prior blog, TCP/IP Performance Factors.

I’ve seen this in the field, and read about it in the Cisco documents about geoclusters. But it’s a complex topic, so we’ll save it for another time, another blog.

Coming soon: I hope to explain how all this might be relevant in a server relocation and re-addressing setting, with vmware / virtualization P2V (physical to virtual) conversion thrown into the mix.

Cisco Data Center Interconnect (DCI) Links

Blog by Craig Huitema: http://blogs.cisco.com/datacenter/comments/data_center_interconnect/

Cisco page on DCI (with more good links): http://www.cisco.com/en/US/netsol/ns975/index.html

Cisco article on L2 extension: http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps708/white_paper_c11_493718.html

DCI Design and Implementation guide: http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns949/ns304/ns975/data_center_interconnect_design_guide.pdf

Cisco DCI and Geo-Cluster Links

Data Center High-Availability Clusters Design Guide: http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/HA_Clusters/HA_Clusters.html (contains a geoclusters chapter)

2 responses to “Understanding Layer 2 over Layer 3 (Part 2)

  1. Interesting link, thanks Dave! (And I of course greatly appreciate anybody who says good things about my blogs and articles — will write for praise ).

    A blog will follow with details — Cisco has upgrades to GSS / ACE that’ll do something similar. It looks like it all hooks into vmware too. Which one is smoother, dunno. The key is to have different per-Data Center VIP’s, and swap so that DNS resolves to the one appropriate for the Data Center the VM is current living in.

    F5 pushes their WAN de-dupe and compression as allowing you to do Storage VMotion as well, addressing my concern about VM at one site, SAN / storage or DB front end at the other (to some extent). [Note: you have to do the server VMotion and the Storage VMotion as two separate steps, which might be automated….]

  2. See the new posting at [url]http://www.netcraftsmen.net/resources/blogs/cisco-overlay-transport-virtualization-otv.html[/url].

Leave a Reply


Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.


Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.


John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.