OTV Best Practices have come to the forefront lately. Various sites are starting to implement OTV. The ones I’m aware of to date are aware they are taking a minor risk (immature technology). They have chosen to go ahead anyway because they are migrating to new data centers and OTV is potentially very helpful in doing so. You don’t need to be doing live VMotion for the benefit to be seen. Even if you are moving physical servers, OTV can be helpful. So if you’re going to be doing OTV, doing it the best way is obviously the way to go. This blog assumes you know how OTV works, and focuses on best practices (Cisco’s, mine, lessons learned).
Remind Me About OTV
For those who somehow missed all the Cisco press and my prior blogs: OTV is a way to transport Layer 2 between data centers over any (sufficiently high speed) Layer 3 IP network. I consider it the best of the Data Center Interconnect (DCI) approaches Cisco provides. OTV includes technology to reduce WAN ARP broadcast traffic, isolate STP instances to each data center, etc. That does not make it perfect: any DCI technique necessarily allows BUM (Broadcast, Unknown Unicast, Multicast) traffic to slosh between sites — it has to, or various protocols and applications would break. OTV reduces broadcast traffic by doing ARP caching and filtering. Apparently further tools to filter broadcast or BUM traffic may appear in future releases, but are not yet available.
See the OTV Technology Introduction and Deployment Considerations document, at http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/DCI/whitepaper/DCI_1.html. It contains lots of good design and other information. I intend to summarize main points from it and from my brain, to give you somewhat of a checklist you can use. Being a consultant, of course I also highly recommend professional design advice and review!
The Hidden OTV Best Practices
I’m going to start with the part that isn’t in the above document, i.e. “hidden” OTV best practices.
From recent experience, there is one best practice you will want to incorporate in your action plan, one that is hinted at but not really spelled out in the above document. With any Data Center Interconnect technique, you really should implement the various functions that limit broadcast traffic and mitigate its consequences: traffic storm control, hardware rate limiting, and CoPP (Control Plane Policing).
See also my blog about Traffic Storm Control
By the way, you really also should implement all the other STP loop defenses that everyone knows about but nobody deploys (at least not until burned): BPDU guard, Root Guard, Loop Guard, and UDLD or Bridge Assurance. If you don’t have a STP loop in the first place, then you won’t need traffic storm control, hardware
Mainstream OTV Best Practices
- Note that VLAN SVI’s cannot co-exist on the same router or VDC as OTV transport of those VLANs. The solution is to either do OTV in another router (Nexus 7K or ASR 1000), or do “OTV-on-a-stick”. The latter is where you use a trunk for L2 and L3 connectivity from core router (or other router) to an OTV device or VDC. Or use two links, one at L3 and one at L2. L3 usually ends up on the L3 WAN or site core. L2 may end up there or at the distribution layer — however high in the hierarchical design your server VLANs extend.
(I feel the lack of a diagram here … see the Cisco document above for many diagrams.)
- Configure the site ID. It can’t hurt. And it’s now mandatory (per TFM). As to why, see the next item.
- If you have two OTV devices at a site, make sure they can route to each other. OTV now requires both site VLAN heartbeat and OTV-side adjacency to operate. I prefer to have a fairly direct connection for this, i.e. if the L3 side of the OTV device or VDC connects to your site or WAN core, then make sure the two WAN cores are connected to each other at L3. Routing via a site L2 core and crosslink puts a lot of devices, links, and some routing in between the OTV devices, and thus is more likely to suffer from an outage or problem.
- If you have dual dark fiber connections and run OTV, you may have to use VLANs and SVI’s rather than routed interfaces for the two site OTV devices to be able to route “directly” to each other. I’ve considered several alternatives for this, and how they interact with one link failing. They’re all a bit klugey, so I’m not going to show drawings of them in public!
- Even if you only have one OTV device at a site, you must configure the site VLAN before OTV will come up. (Required the last time I tested it, anyway.)
- When doing unicast-based OTV, dpending on your High Availability requirements, strongly consider having two adjacency servers.
- When doing multicast-based OTV, it might be a good idea to (a) verify all OTV site pairs in your WAN support multicast routing and (b) actually verify that multicast gets to the other end (in each direction) with a low packet loss rate. I.e. catch any IPmc problems in your WAN before you end up troubleshooting OTV instability or other odd behavior.
- I and some customers are torn between unicast and multicast-based OTV. For many sites, multicast-based OTV has clear benefits. On the other hand, many of us (well, me and a couple others I’ve talked to?) feel that IPmc code in general is less mature, likely to be less robust, and it adds complexity, suggesting there is less overall risk to doing unicast-based OTV in the absence of any factors making IPmc-based more attractive. Such as “many” datacenter sites, or need to transport IPmc flows.
- We have been setting the OTV devices up with each having a virtual port channel to the L2 Aggregation switches, and no cross-link between them (no real value to having one). If you cross-connect the OTV pair, you probably want “wide” vPC from them to the Aggregation pair, assuming the latter are Nexus / Nexus VDC and doing vPC.
- The OTV VDC or device can use static routing. Dynamic routing has the virtue of providing logging of adjacency loss when the link goes down, which is usually more conspicuous. Links bouncing is normal, routing adjacency issues tend to get noticed.
- Practice troubleshooting in the lab, before your first outage or problem. (This really applies to any technology. Practicing a STP loop can be — enlightening, motivating! And save a lot of time when you experience your first STP loop.)
- Do think about inbound and outbound routing when considered HSRP / FHRP filtering between OTV-connected datacenters. If you have stateful firewalls or load balancers, you may well need to exercise control over default routing, in which case you likely won’t always be able to do optimal outbound routing.
- Note the OTV ARP timer should be far less than the MAC aging timer. You want this in general on switches, to avoid / reduce unicast flooding.
- Limit which VLANs cross the OTV tunnel(s). Yeah, every-VLAN-everywhere in one datacenter (“VLAN entropy”) may eventually mean every-VLAN-everywhere-in-every-datacenter. It strikes me as worthwhile risk reduction to fight the battle to limit VLAN scopes as much as possible. It may be a rear-guard action, but still is worthwhile. YMMV.
- Check the current Cisco documentation as to recommended (tested) OTV scalability limits. Exceeding them may bite you. I personally see the MAC address maximum (which is across all extended VLANs) as the limit I think most sites will hit first.
- If you run PIM on the OTV join interface, make sure it is not PIM DR: adjust the DR priority to ensure this.
DCI / OTV and STP Loops
The OTV AED behavior should prevent OTV itself from causing a STP loop.
However, there is still risk of “spillover”. That is, if either data center experiences a STP loop, the flood of BUM traffic will spill over across your OTV link. If there is an older CPU at the “old data center”, it may experience problems under a sufficiently severe load.
This is why I recommend the traffic storm control and hardware rate limiting above.
Future Best Practices
(Hey, it’s not often my crystal ball coughs up a Best Practice that’s back to the future!)
If / when Cisco ASA and/or ACE allow stateful clustering, do you cluster across OTV? I personally think it a risk or Worst Practice: if the stateful replication gets messed up, both datacenters could be adversely affected or off the air. (I’ve seen it happen in a single data center with CheckPoint firewalls.)
You can use the comments capability below to provide your own opinion. I’d love to hear what people think!