Quick Network Redundancy Test

Terry Slattery
Principal Architect

Do you run a network with a high level of redundancy?  If so, how susceptible is your network to the “second failure” syndrome?  This is where a first failure occurred, but it wasn’t noticed until the second failure took out the redundant link or node.  Basic checks can often detect when an initial failure has occurred, but these checks often require a bit of “network hygiene” before they work.

Start by checking for router interfaces that are administratively up, but operationally down (i.e. up/down state).  If the network is not maintained in a consistent manner, you may be facing hundreds of interfaces in this state.  It looks like a daunting task when you see 300 or 400 interfaces in this state.  You have to check each interface to determine if it should be administratively down, then go through the change management process (you *do* have a change management process, don’t you?) to configure them down.  I’ll bet that you identify one or two interfaces that were in the up/down state that should be in the up/up state.

You can tackle the long list by prioritizing the interfaces into three groups, organized by importance.  The critical interface list will likely be much smaller than the overall, probably by a factor of 10 or more.  I recently saw a site in which NetMRI was reporting nearly 400 interfaces in up/down state (the Router Interface Down issue).  I was able to identify the critical interfaces by using the Quick Search box (see example using 10.9.10 below) and entering a common device name, reducing the list to around 40 interfaces.  That’s a much more manageable list, one that can be tackled in a couple of weeks.


But once the critical interfaces are handled, don’t stop there.  If you take care of all the interfaces, the analysis of up/down interfaces can be the quick test for whether your redundant network is really redundant, because the most common source of redundancy failures is not noticing the first failure.

Don’t rely on identifying up/down interfaces alone.  You may have heard that the most common source of network failures is configuration errors, and this source of errors hit an organization that has a redundant network.  An interface was intentionally shutdown to aid in troubleshooting a problem.  There’s nothing wrong with this action.  But it was overlooked when the original problem was corrected, so a part of the network was running on a single connection.  Some time later, the redundant connection also failed, so a network outage occurred.  This configuration error would not have been identified by the NetMRI Router Interface Down issue.

A NetMRI job script and related issue was subsequently created to check all critical interfaces within a device group and report any interfaces in the down/down state.  It really is a check to identify any interfaces that are not in the ‘up/up’ state, so they will appear in two issues if an interface is in the up/down state.

The key lesson was there there are some configurations that need to be identified as problems soon after they are deployed.  These faulty configurations are specific to the organization’s network, so some customization is needed.  In this case, the script could be easily applied to other devices and interfaces, but would need to be customized to select only those devices and interfaces to which it should be applied.

NetMRI is now protecting this customer’s network from accidental interface shutdown on key interfaces and is freeing their staff from having to periodically check the interfaces using manual procedures, which was the alternative approach.



Re-posted with Permission 

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html


Leave a Reply


Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.


Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.


John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.