High Availability Networking (>5-nines)

Terry Slattery
Principal Architect

One of the best sessions I attended at CiscoLive this year was titled “BRKRST-3365, Unified HA Network Design: The Evolution of the Next Generation Network” by John Cavanaugh, Chris Cornwall, and a whole team of contributors.  They talked about the High Availability (HA) network designs that they have done over the past ten years.  Some of their network designs have had no application-affecting down-time over a ten year period. There were several key factors that influence high availability.

The first important factor was cross-connected dual-core networks.  They labeled the two cores as Red and Blue with cross-connections so that a single failure would not cause packets to take a much longer path around the failure, potentially impacting application performance.  Why two core networks?  Full redundancy allows one core to be taken out of service for maintenance while production continues on the other core.

Dual-core redundancy is important for companies who can no longer afford maintenance windows for performing network upgrades.  One VP of network engineering at a financial firm told me that he has two maintenance windows: July 4 and Christmas.  Global companies may find those days are also unavailable because significant parts of the world economy runs year-round.  Being able to take out half of the network for software and hardware maintenance while the business runs on the other half allows prompt resolution of relatively minor network problems as well as addressing security vulnerabilities in the network infrastructure.

The other major factor that I liked was their recommendation for reduction of failure domains.  A simple example is to design relatively small Layer 2 domains so that when a spanning tree loop occurs, it has a smaller range of impact.  I’ve heard of a 900 server data center outage that was due to the insertion of an old switch into a data center-wide spanning tree domain.  The switch was old enough and slow enough that it couldn’t perform the task of the root bridge.  The entire data center’s operation was affected.  A smaller Layer 2 domain would have reduced the negative impact.

Another HA recommendation that I like is putting redundant servers on different subnets.  Equipment on the subnets should not share common failure sources like routers, switches, power feeds, and cooling.  Geographically diverse data centers help, but watch out for latency between them.  Terrestrial latency is roughly 10ms per 1000 miles and high latency paths between data centers may negatively affect applications whose protocols rely on a packet per round trip time.

I highly recommend that you take the time to look up the recording for this session.  It was definitely one of the best I attended.



Re-posted with Permission 

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html


Leave a Reply


Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.


Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.


John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.