Exploiting Layer 2 over Layer 3

Peter Welcher
Architect, Operations Technical Advisor

This is part 3 of a series about Layer 2 over Layer 3, although I changed the titling somewhat. This blog takes a look at a very practical application of some of the Layer 2 (L2) over Layer 3 (L3) techniques discussed in prior articles. Specifically, how L2 over L3 can help with server relocation and re-addressing, especially in VMotion form. 

I’ve been working with a customer where servers are fairly randomly scattered within a data center. The original move-in did rows by vendor and chassis size, which is considered rather undesirable according to some sources (a debate for another time / article). Servers belonging to a customer application / service team or project group tend to be scattered around the data center floor. This is at best inconvenient. It also makes project-based or business unit / mission-related security challenging. 

One group at this site wishes to consolidate their servers into one server zone (a couple of rows of racks). My guess is that when done, it may actually be more like a few large blade server racks running vmware ESX. Another motivation is physical consolidation to prepare for enhanced security or moving to an external cloud, and simplified operational management. 

We’ll take a look at how L2 over L3 techniques might help. I’ve been involved to the extent of some discussions with a group figuring out what is needed and technically how best to accomplish it. This is solely based on conceptual discussions so far — try this at your own risk. And if you have a better way, please let me know (email me, or comment on this article). 

Technical Background

Some L2 over L3 background and context, and lots of good links, can be found in Part 1 of this series, Understanding Layer 2 over Layer 3 (Part 1)

Pictures and discussions about traffic flows, routing considerations, and First Hop Routing Protocol (FHRP) considerations, are covered in Part 2 of this series, Understanding Layer 2 over Layer 3 (Part 2).

Tackling the Challenge

We rapidly established that the short term goal is physical location consolidation and isolation. The servers are mostly currently connected to VLANs based on which zone (rows) they are in. Most sites do this, it works fine until you start thinking about isolating groups of servers from each other, to harden against spread of a compromise from one server to a more critical server. Unless you really like host-specific ACLs (or probably Cisco VACLs for within-VLAN packet filtering), you want VLANs that represent logical groupings — but for most sites that now realize that, it’s far too late. Mention “re-addressing” around a server admin and they’ll probably throw something at you, or start frothing at the mouth. .

In this case, the assumption is that some free space can be located, which will provide a “target zone” to move servers to. To ease the burden, some large ESX servers will be used. This assumes the data center crew will approve the use of the space, and that the very tight power ceiling at the site will not prevent installing them, despite a net reduction in power consumption — but those are Layer 0 (Facilities) and Layer 9(Political Layer) issues. 

To gain logical separation from the other servers in the target zone, a new VLAN and subnet will be needed. Thus all the servers for this group will need re-addressing. No thrown objects or mouth froth apply, since there is no real alternative. And since the changes are viewed as some local changes and a lot of between-zone moves (known to require re-addressing), this probably matches expectations. 

In some cases, the servers that need to be moved may be Virtual Machines (VMs). In other cases, they are physical chassis.

To facilitate the move, and presumably due to known suitable loading and other server characteristics, the team also is going to do physical to virtual (P2V) conversion, which will enable VMotion to be used, rather than physically moving server chassis around. One can either re-address the physical server, then P2V it, then halt the physical server and boot the VM. Or one can do P2V, shut down the physical server, boot and re-address the resulting VM via the console. DNS changes would have to occur in tandem with this, either way. 

The former approach has the minor drawback that you can’t just halt the VM and restart the physical server if there are problems during the change window. 

The latter approach has the virtue that you could have both the VM and physical server running, using DNS to swing customers between them — assuming any back end databases or whatever don’t object to that. Also, if you aren’t moving VM’s via VMotion as well, you might not need any of the EoMPLS discussed below! (As far as I know, you can P2V to a different subnet, it’s just the VM will probably be mis-addressed if you do that.)

The former approach was preferred, so as to minimize the changes between the physical server and the VM, and because the server administrators were more comfortable with it. 

The problem that then arises is how to have a server in a subnet that it is separated from by the L3 core switches. That’s precisely the situation in the diagram used in a prior blog (repeated here with minor changes for convenience).

Since OTV is new and not available for Cisco 6500 model switches yet, the approach planned for this use of L2 over L3 is to use EoMPLS between the target zone and one other server zone at a time. Re-address and move the necessary servers within that zone, tear down the EoMPLS tunnel, repeat as needed for other server zones. That scales a lot better, and minimizes troubleshooting complexity. It does require coordination with the network team. The second approach mentioned above does not seem to need L2 over L3 at all — less coordination. 

EoMPLS to the Rescue!

One could do this temporary EoMPLS with multiple VLANs, or with bridging two VLANs together. 

If you do it with multiple VLANs, you would be re-addressing the physical server and changing the VLAN (on the switch — the servers are just in an access VLAN). What I like about that is that it is a fairly straight-forward networking approach. EoMPLS causes the target VLAN to be present in two places, so to speak, but if you only the L3 switch at the target end has an IP address on the VLAN, routing will behave pretty much as one might expect. And only re-addressed but un-P2V’d servers that are not yet running in VM form would be reached via the EoMPLS. 

The diagram below tries to show the multiple VLANs approach. The blue represents the “old” VLAN and subnet addressing, the purple (lavender?) represents the new VLAN and subnet. A server would be connected to one of the other, not both. 

One consideration would be that for a physical server, the access VLAN on the ports it connects to would need to be changed in conjunction with the re-addressing. If you can do the work without changing the port VLAN, there is less frequent network team involvement required to do the per-server port VLAN changes. Set up the EoMPLS, leave it up for a few weeks while the server team does its job, then tear down and build another. 

That’s where the approach of bridging two VLANs together comes in. In this case, port-based EoMPLS plus a local loopback cable at one end can interconnect the target VLAN in the target zone to the move-candidate VLAN in the second server zone. (The cable is plugged into two non-EoMPLS port, one for each VLAN, to cross-connect the VLANs. This normally a really bad thing to do unless you’re trying to come up with a challenging lab troubleshooting problem.) You could also use VLAN-based EoMPLS and interconnect two different VLANs. That is not as useful with where we are going to end up below, however. 

The following diagram suggests what this looks like:

I certainly do not wish to interconnect VLANs and different subnets all around a data center. Talk about painful to troubleshoot! One server zone and minimal cross-connecting at a time, please!

A trick that can be used with this second bridging-two-VLANs approach is to use a variant of what I call “poor man’s VLANs”, where you have a Cisco secondary address on the routed interface. That’s two subnets on one VLAN. By doing that, traffic between the subnets has to go via a L3 router. Not very efficient, usually a sign of poor design or planning. It might be useful here — bear with me. (Or it might be a really lousy idea, in which case you can throw stones at me, virtually of course, via rude comments to this blog. Hint: I get to approve which ones become visible, so be polite.)

Suppose you only have the target subnet IP addresses on the target VLAN interface on the L3 switches connecting to the target zone, and you only have the original subnet and its associated IP addresses on the to-be-moved VLAN’s L3 switches. Then inbound traffic to the target subnet will go via the target zone. If a server has been re-addressed but not P2V’d, then EoMPLS gets traffic to it in its original location. Outbound, its default gateway is the HSRP VIP provided only by the L3 switches in the target zone. Once the VM is up and running, it is directly connected. Meanwhile, servers not yet re-addressed, or ones that are not being moved, route just as before. The following diagram shows this.

Our first EoMPLS approach (re-address the server and change the VLAN its ports are in) approach behaves similarly, except that the original server VLAN is purely local, i.e. minimal possible impact within that VLAN on the existing unchanged servers during the move. This might be mildly lower complexity and risk — but does require more network admin involvement for the port changes. (And that costs money and time, if they have to be present at night or during a change window weekend, just to wake up once every hour or two to do some port changes).

What about the VMs?

Note also that having Layer 2 connectivity between the server zones allows use of VMotion, which requires L2 adjacency. This is especially useful if some of the servers to move are physical and some are VM’s running on ESX servers “in the wrong physical server zone”. Which we did mention when we described the customer situation.

Each VM will have to be re-addressed. With the bridged VLAN approach, the re-addressing could happen before or after the VMotion, at the price of some potential traffic flow inefficiency in the short term. An ESX server might commonly use Console/Kernel, Data, and VMotion VLANs (at least). But you are probably only re-addressing the VM’s, not the ESX chassis, so you might only need to work with Data and VMotion VLANs, plus any other per-VM port groups you’re using. The single EoMPLS tunnel would have to trunk all 2, 3, or however many such VLANs you have in use. The picture would be like above, except there would be several bridged VLANs, with old identity at one end and new subnetting on the target zone switch. This would work reasonably well for readdressing one VM at a time — and the VLAN changes would be purely internal to the ESX server.

One detail:  the new VLANs would have to be allowed on the trunks from it to the switch(es). Another detail: the loopback cable would have to trunk connect the old and the new VLANs. (Or use 3 cables, one per VLAN pair?)

Alternatively, EoMPLS could connect the three target VLANs to the other (“old”) server switch and zone, and only the changed VMs or servers would be in the target VLANs. This might be fairly simple for the VMs — just change the VLAN connectivity within the ESX server. But, as we saw earlier, that would require port access VLAN changes for the physical servers. 

Conclusion: pick what looks like might work best for you. And consider trying it in the lab before doing it to a production server under time pressure!

4 responses to “Exploiting Layer 2 over Layer 3

  1. I need to span a vlan between 2 data center and I need to use only layer 2.
    Both centers are connected via ethernet. What is best way to do it? Is BVI a good solution?


  2. I’m not understanding. If you have Ethernet, you can run a VLAN across it. That puts both data centers at risk. If you want to make it a routed link, then you get into the DCI (Data Center Interconnect) topics, including OTV, discussed in various of my blogs. That’s a complex design issue, so I have to refer you to those blogs, and to what Cisco has written.

    BVI is process switched AFAIK, hence a really Bad Thing to do to your router CPU. Very low performance. It dates back to the 10 Mbps router days. Generally BVI is router-only as well.

  3. Peter,
    I too need to span muliple vlans between 2 data center’s for a short time as one data center migrates to the other. We have server clusters that we will break apart for the physical move, moving one of a pair, turning down the other and re-using IP space. Each site will have MPLS WAN Ethernet connectvity. So instead of running EOMPLS it would be better(easier) to add the required VLANS to the MPLS Ethernet connections at each end instead?
    Thank you,

  4. I’d need a little more info to answer that. It sounds like you have L2 over MPLS service between datacenters. If so, you could run a VLAN across — but the risk is shared fate. That is, STP meltdown that affects both datacenters. Certainly you’d want to use traffic storm control on a Sup720 if you do that.

    You could also do EoMPLS. The main virtue there might be that a STP meltdown will clobber the EoMPLS device, thus rather drastically limiting the spread of the problem in the worst case. Disadvantage: arguably a bit more complex. It’s the only obvious answer if you’ve got L3 MPLS in the middle.

    If you have Nexus 7000’s, OTV is pretty clearly the way to go (modulo lack of support for non-multicast WAN right now, 12/1/10 — supposedly coming soon). Yes, I drank the Cisco Kool-Aid on that. It seems to address a number of the concerns that arise with L2 between datacenters.

Leave a Reply