Cisco Fan Failure

Author
Terry Slattery
Principal Architect

Have you ever had a Cisco router or switch shutdown due to a fan failure?  While looking through NetMRI‘s daily list of analysis issues, I found a Fan Failure issue (it is named “Device Fan Problem” in NetMRI’s Analysis page).

It was really interesting to me because a fan failure produces a syslog message and it should have been caught by the NOC, who uses other tools to identify important syslog and trap messages.  Of course, the problem with syslog and SNMP traps is that they typically use UDP for their transport mechanism.  UDP packets are not retransmitted if a packet is discarded due to congestion or because it is damaged in transit.  Most network people know that UDP packets may not arrive at their destination, but because most networks are pretty reliable, we rarely see it.

Because UDP messages may be lost in transit, what can we do about network management that depends on UDP for much of its operation?  A good network management system will retry SNMP queries until it is able to retrieve the data that it needs.  In this case, NetMRI was able to gather information about a fan failure that had not made it into the logs.  While using SNMP polling to retrieve similar information to that reported by syslog may seem like a waste, I think it is important to track transient values or detect problems where the syslog message didn’t make it to the syslog server.

When I saw the issue, I verified that it had failed.  [I like to verify that my tools are operating correctly and that I can trust them – so many NMS products produce false alarms that I’ve grown accustomed to checking them for proper operation.]  NetMRI was correct, the device CLI reported the failed fan.    A quick email to the support team allowed them to dispatch someone to repair it before the device overheated and shutdown, potentially causing an unplanned network outage.

I like to understand failure modes and how things should operate when a failure occurs and what I can do to minimize the impact of the failure.  In the case of UDP, I like to use alternate collection methods that aren’t as timely as a log message, but that still let me know when things break.

-Terry

_____________________________________________________________________________________________

Re-posted with Permission 

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html

infoblox-logo

Leave a Reply

 

Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.

 

Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.

 

John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.