Tracking Key Interface State

Author
Terry Slattery
Principal Architect

The Baltimore-Washington area has been hit with several significant snow storms this winter and those storms have impacted the operations of a large metropolitan area network.  Problems range from power outages at some sites to link problems between sites.  The result is that a large number of interface up/down messages were recorded by syslog.

Syslog is probably the first notification that a network team receives when an unexpected failure occurs, so watching it is a useful way to stay in touch with what is happening on the network.  When a well-connected site goes down, the neighboring devices report link failure to the down site.  The NOC then needs to determine what caused the failure.  Is it due to power or did the link die?

If you see all the links to a site go down, you can guess that it is a power problem and be right a large percentage of the time.  If at least one link stays up to the site, you know that it isn’t a power problem.  I’m assuming that you have diverse routing to a site if it has more than one link.  If the site isn’t connected by links that use diverse paths, then a single cut cable (tree falling or too much ice on an aerial line) looks the same as a power outage.

The interesting problem came to me after the last snow (24-36 inches).  One of the network team asked me if there was a way to verify that all the interfaces that were operational before the snow had been restored to service.  With the redundancy built into the network, we could have a site that was running with a single connection because the backup connection was still out.  None of my tools, including NetMRI, could tell me that directly.

But I was able to figure out a way to get that information using data that NetMRI had collected.  There are two issues in NetMRI regarding down interfaces: Router Interface Down and VLAN Trunk Port Down.  The idea is that if you have cleaned up your network, any infrastructure interface that is in the up/down state should be up.  An infrastructure interface is one that connects network equipment to each other.  By looking at router interfaces and trunking interfaces, NetMRI determines nearly all of the important interfaces in that interconnect infrastructure devices, even if you have disabled useful protocols like CDP.

I was able to export the CSV data from each of these issues for the day before the snow and the day after the snow.  I then spent some time massaging the data with Excel and a script to generate a list of interfaces that were up on the day before the snow and were down after the snow.  I found two interfaces.  While this process took me a couple of hours, there was no way to obtain that information from any of the other data sources we had.

The data we had available could be used to show the interfaces that had been up sometime in the last month and we could use it to determine interfaces that are currently down while having been recently active.  This approach could still result in a few interfaces that have been down for more than 30 days but no one noticed.  However, that’s a much better result than not knowing about a down interface.

-Terry

_____________________________________________________________________________________________

Re-posted with Permission 

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html

infoblox-logo

Leave a Reply

 

Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.

 

Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.

 

John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.