Event Reporting

Author
Terry Slattery
Principal Architect

I’m working on network management requirements for several customers and keep running into the same requirements.  One of the key requirements is that of event reporting.  Events typically take two forms: syslog and SNMP traps.  In both cases, they are asynchronous notification of something happening on the network.  Sometimes the notification is not important, like an unimportant edge link going up/down. But other notifications are critical to early awareness of significant network problems.  The problem with event reporting is that the event logs can often be quite long and identifying the one or two key things out of a log that is thousands of lines long is literally like finding the needle in the haystack.

At some sites, the network staff watches the tail of the log file, often ‘tail -f ‘ on a Unix/Linux log server.  If you’re watching the log file when something important occurs and you happen to notice, then that’s good.  If you’re not watching or don’t notice it, then that’s not so good.

Other sites get a daily report on network events.   The summary report shows a count of all the events for the day in the upper section.  The lower section shows the count of the events, the device that sent the event, and the event name (see below).  A quick scan of the upper section shows what events have happened and the lower section lets you identify which devices you should investigate, based on the criticality of the message, or perhaps the volume of the message.  In the example below, device test1.com and test4.com are experiencing a large number of EIGRP neighbor change events, which would indicate a link problem on each of these devices.  Together, they comprise the majority of the EIGRP neighbor change events.

Summary of GNS Cisco syslog Messages on Wed Jan 17 23:59:00 EST 2007
Cisco Messages:
 437 DUAL-5-NBRCHANGE
 353 LINEPROTO-5-UPDOWN
 114 CRYPTO-6-IKMP_MODE_FAILURE
...
Messages sorted by frequency and source device:
 346 test1.com DUAL-5-NBRCHANGE
 114 test2.com CRYPTO-6-IKMP_MODE_FAILURE
 84 test3.com LINEPROTO-5-UPDOWN Tunnel119
 67 test4.com DUAL-5-NBRCHANGE

Other events, like fan or power supply failures, environmental alerts, link up/down on core links, unplanned device reboots, and security notifications are important for many organizations.  As I noted above, watching a log file isn’t very valuable.

NetMRI has an add-on NetMRI Event Analysis appliance that can collect events (including SNMP traps) and has an exception notification system that creates NetMRI issues whenever an important event is detected.  The underlying engine is Splunk, so it has excellent pattern matching and filtering capabilities.  A great addition in the NetMRI implementation is the ability to use device and interface grouping to filter events.  So an interface up/down event on a core link generates an issue while the same event on an unimportant edge interface does not generate an issue.

Having issues generated allows the network staff to reduce the number of screens that need to be watched or monitored.  NetMRI also allows important events to be emailed, so a message to your phone can alert you to an important network event.

By integrating event log analysis with other network analysis, tools like NetMRI can take a big load off the network staff and that’s what’s important to a smoothly running network.

-Terry

_____________________________________________________________________________________________

Re-posted with Permission 

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html

infoblox-logo

Leave a Reply

 

Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.

 

Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.

 

John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.