Network Availability and High Availability Networks

Author
Terry Slattery
Principal Architect

Scott Hogg of Global Technology Resources (GTRI) did a nice blog post for Network World way back in April, 2009 about High Expectations of Network Availability (http://www.networkworld.com/community/node/40827) and a slightly more recent one in May, “Forget Five-9s — Go for 100%!” (http://www.networkworld.com/community/node/42281)   Scott spends a lot of time working with customers and has a good perspective on the requirements for a smoothly running network.

Related to the topic of smoothly running networks is the CiscoLive presentation on High Availability (HA) networks (see High Availability Networking (>5-nines) which offered specific advice for Cisco-based HA networks, but which can apply to networks built with other products.

I mention these blogs and presentation because more networks are being designed and run with high availability goals.  It is important to have realistic design goals when you are designing a network for high reliability.  Too much redundancy can actually make it more difficult to know what is going on within the network and to know how the network will react to specific failures.

I once did a consulting job where an organization had more than two paths from each site.  The problem was that the network was not engineered to handle failures.  The assumption was the failures were infrequent and that they would operate with slightly degraded performance when a failure occurred.  However, the other links became overloaded and had so much packet loss that the primary business application wouldn’t run correctly.  The result was that the network oscillated as traffic switched from primary path to a backup path.  The backup path subsequently became overloaded, causing traffic to then switch to an alternate backup path.  While the network continued to operate, application delays became significant due to the overloaded paths and subsequent packet loss.  The point of this example is that an overly redundant network that is not well designed can have no downtime, but the applications act as if the network was down.

Maintaining a HA network becomes a problem, because, as Scott notes, maintenance windows are becoming more difficult to obtain.  With applications running non-stop in VM environments, I expect network maintenance windows to shrink even more.  One way to address the shrinking maintenance windows was described in the HA Network Design presentation (mentioned above) at CiscoLive by John Cavanaugh and his team.  They explained how a dual core network where the two cores are cross-connected can have nearly 100% reliability because each core can be taken down independently of the other core to perform hardware and software maintenance.  Designing a network that will operate correctly with this level of redundancy can be tricky.  You need the right levels of redundancy, appropriate bandwidth at the right places, and the proper configuration of the routing and switching protocols to make it function correctly when a failure occurs or when you need to take down one of the cores for maintenance.

My favorite topic with HA networks is network management.  A good NMS must tell you when a network failure or when overload of devices or links occurs.  An HA network may experience one or more failures without an outage due to a good design and you need something that alerts you to the failure before a second failure causes an outage.  Unfortunately, there are many NMS packages out there, but very few that do all the things that are needed to monitor an HA network (I’m talking about network monitoring, not application or server monitoring, which is different in many aspects).

I use a combination of tools, with NetMRI providing the configuration and change management functionality.  The reason that it is good is that it has two fundamental capabilities that are required:

  1. It can analyze configurations to detect exceptions to configuration policies.  For example, are the ACLs for SSH and SNMP access consistent?
  2. Scripts can be run on network devices to execute commands.  When the ACLs for SSH and SNMP need to be updated on hundreds of devices, you want to have a system that can do it for you and create a log of the successful and unsuccessful updates.

These capabilities are critical to a smoothly operating network because the majority of network problems are due to configuration mistakes.  How do you make sure that your network is properly configured and that new changes that need to be rolled out are properly implemented?

-Terry

_____________________________________________________________________________________________

Re-posted with Permission 

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html

infoblox-logo

Leave a Reply

 

Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.

 

Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.

 

John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.