New Nexus 9K Items
I hope you are not thinking, “What’s this about OTV and DCI needing defenses?” But if this question puzzles you, this blog is for you. The purpose of this blog is to make sure everyone (who reads this) is aware, Data Center Interconnect (DCI) techniques, and in particular OTV, do not protect your network from cross-data center STP (Spanning Tree Protocol) problems.
The older DCI techniques and recommended designs go to some (complicated) lengths to prevent a STP loop when there were two or more DCI links or virtual links. OTV goes further, in that the AED (Authoritative Edge Device) solves the potential loop issue simply, and OTV inherently does not extend the STP (BPDU) domain between data centers. STP isolation is good since the bigger the STP domain, the less stable it tends to be. (See also “root bridge war”.)
BUT: Just because you are running OTV still does not mean you’re safe from STP impacts!!!
Besides DCI/OTV design, you also still need to think about is safety measures, defenses. If (when?) a STP loop happens in a datacenter, what protects the other one?
I’ve run into some people who think you need to be part of the loop to experience the ill effects of a STP loop. Not so! The looped links generate the major torrent of BUM (Broadcast, Multicast, Unknown Unicast) traffic. But at traffic floods anywhere within its VLAN. If that VLAN extends to your other datacenter, say via OTV, whammo! Your other data center also experiences massive traffic.
The following figure illustrates this “spillover” effect.
If datacenter #1 has shiny new Nexus gear with 10 G NICs, and you have a 10 G dark fiber to Datacenter #2, any old Sup 2-based based Cisco 6500 switches are not going to like it, in a major bad way. This is something to look out for, especially in old to new migration scenarios.
So don’t think that OTV “contains STP to one datacenter” suffices. Not so! Yes, that is an advantage of OTV. But it means STP BPDUs and topology, NOT the spillover effects of traffic. Large scale STP is nasty, with timing effects, so confining the STP tree topology and BPDUs in particular to a single datacenter makes it more robust, less alike to “lose it” or have “root bridge wars”. But the semantics (meaning, expected behavior) of a VLAN require BUM flooding.
Yes, with OTV Cisco proxies ARP to cut the BUM traffic some. That might help contain any looping ARP traffic. Which is a lot of what I’ve seen in packet captures from STP loops. But even the rest of the BUM traffic can be enough to be a real problem. So protect yourself!
Now that I’ve got your attention, what’s the solution? The issue isn’t STP, so tools like BPDU Guard etc. aren’t relevant. The problem is the flooding.
Tools for dealing with that: hardware and software rate limiting, particularly on the more powerful switches. Control Plane Policing (COPP). And yes, the Sup2 does rate limiting in software, and I’m told that by the time it kicks in the CPU is already toast (rendered useless).
It turns out one of my Chesapeake NetCraftsmen colleagues, Augustine Traore, did some interesting lab testing to see how effective various STP defensive measures are. For the results, and also some ideas as to what you can do to protect your network, see his blog.
I and others have written about risk, howL3 separation is a bit more robust than L2. Fewer, simpler failure modes. For more about this, see Ivan Pepelnjak’s blogs at ipspace.net. If you’re still contemplating L2 DCI (or your boss is), presumably you have a business reason to do so. Meaning clusters, DR via VMware, VMotion, or datacenter migration are causing you to require L2 between datacenters.
If you’re doing a Data Center Interconnect design, do go ahead and think about what technique to use, which devices and code releases support it, is it mature enough, how do I configure it, etc. And with most of the DCI techniques other than OTV, you’ll want to think about how to provide redundancy while not creating a STP loop or having STP block one of the redundant links.
But also think defensively! And do take some precautions along the lines outlined above or in Augustine’s blog.
I googled a bit, here are some interesting articles, either about prevention or finding the cause of the STP loop.
Spanning Tree Loop Troubleshooting and Safeguards, at https://supportforums.cisco.com/docs/DOC-14223
The Case of the Spanning Tree Problem by Fred Baker (Cisco), at http://tcpmag.com/archives/article.asp?editorialsid=20
Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.
Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.
John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services. Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.
He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.