Be Prepared: Handling Potential Network Failures

Terry Slattery
Principal Architect

I was at VoiceCon two weeks ago, participating in a panel where I talked about network resiliency and presented my VoIP Troubleshooting and Monitoring tutorial.  Both presentations included examples of how you should be prepared for network failures.  I’m a proponent of understanding the causes of network problems and being able to quickly diagnose failures by looking at the problems that they cause.  Let’s say that you want to be prepared to identify and react to a spanning tree loop.  First, you need to be able to quickly identify that a forwarding loop has formed.  Your NMS should show a CPU spike on switches in the STP domain in which the loop exists, due to processing BPDUs that are circulating.  Ports that are forwarding looping traffic will report high utilization.  A list of typical symptoms exist in the Cisco document “Troubleshooting STP on Catalyst Switches Running Cisco IOS System Software”, Document ID: 28943.  Unidirectional links and similar problems are described in “Spanning Tree Protocol Problems and Related Design Considerations”, Document ID: 10556.

Links must be shutdown or disconnected in order to break the loop.  This is where planning will pay off.  Examine the image below, taken from the Cisco “Troubleshooting STP” document referenced above.  A loop between the ADB switches, the ACB switches, or the AEB switches, is easily broken by disconnecting any link in the loop.  I would plan to take out the AB link because that would break any of the three loops that I identified.  If that doesn’t take care of the loop, then the problem is likely due to a loop induced between VLANs or between two ports in one VLAN.  It could be due to a cabling mistake or a dual-homed server with bridging enabled between two interfaces.  In this case, you have to be prepared to isolate each switch until you find the combination that contains the loop (it may involve more than one switch).


Now imagine an STP domain that spans ten or more switches and you have the potential for a time-consuming troubleshooting task if you’re not well prepared.  This is one fo the reasons why we at NetCraftsmen recommend that failure domains be limited in size.

If the STP loop you’re troubleshooting is serious enough, you’ll not be able to use the network to access the switches.  Someone will need to physically unplug the network connections.  Having them clearly labeled, with respect to the cable colors, labels and interface descriptions, will make your troubleshooting go faster.  And be prepared to properly reconnect the links if you’ve had to physically disconnect the cables.  It doesn’t help if you quickly unplug three infrastructure links and then puzzle over which cables connect to which ports on the switch.

Now think about other common problems and how you’ll tackle the troubleshooting tasks to quickly identify the source of the problem.  If your network uses a large number of static routes, be prepared to handle a routing loop where the interaction between a static route and the dynamic routing protocol creates a loop.

In a network supporting VoIP, you should understand the process used by phones to power-up, register, and operate.  You can use the OSI model to segregate problems into physical layer, data link layer, network layer, and application layer.  Knowing the types of problems at each layer allows you to quickly identify a few troubleshooting tasks to perform to identify the source of a problem.  An example is one-way audio; think about how you would diagnose its cause and how you might fix it.

How can you be prepared?  You need to know where and how you’ll tackle specific problems.  What diagnostic tools do you need and are they in the appropriate locations?  Do you know the actions that you need to take to isolate problems or the diagnosis that you need to perform to gather enough information to characterize and identify the source of a problem?



Re-posted with Permission 

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under


Leave a Reply


Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.


Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.


John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.