I’ve been doing some more reading about network event handling and found some interesting articles and a few facts that I’d like to share. I have my own ideas about handling network events, but am open to learning what other people do and why they prefer their approach. It helps me learn new approaches or to validate the approach that I use. Sometimes I run into weird approaches to things, but that allows me to think about alternatives and potentially identifies a variation to an approach that I use.
I prefer using syslog over snmp traps and just learned an interesting tidbit from a report Cisco did for a customer in which they quoted statistics for the number of traps versus the number of syslog messages. A 6500 has about 90 traps that it can send. But it has about 6000 syslog messages. Wow, that’s more than 60 times more messages via syslog than via traps. That puts some facts on my impression that syslog is a much richer source of network events than traps.
I don’t mind snmp traps. In fact, I think of both syslog and traps as asynchronous network events. Each has a different format. I prefer syslog because of its simplicity and that I can read the information without having to decode an OID. But since both function similarly and only use differences in encoding (ignoring TCP for syslog and snmp informs), I think of both as events.
Pete Welcher of Chesapeake Netcraftsmen and I were talking several years ago about handling syslog and we both agree that it is useful to filter syslog messages, removing the common messages that are unimportant. Then look at what’s left because they are the more important and less common events. I’m thinking of things like Pinnacle or Coil ASIC errors in the 6500. Or environmental events like a power supply or fan failure. I’ve even seen a rare memory parity error on 6500s (Cisco’s message decoder says to reseat the card and if the error persists, call TAC for a replacement card).
I decided to do some web searches this week on the topic and found a couple of interesting web pages that talk about doing the processing events. The first is a blog by Robert Fekete at http://lwn.net/Articles/369075/. In it, he mentions a quote from Markus Ranum: Artificial Ignorance – a process whereby you throw away the log entries you know aren’t interesting. If there’s anything left after you’ve thrown away the stuff you know isn’t interesting, then the leftovers must be interesting.
He goes on to describe processes for handling logs.
The second article, by DataCenterWorks, at http://datacenterworks.com/stories/antilog.html, is titled “Sherlock Holmes on Log Files.” In it, they use the quote from the Sherlock Holmes books: It is an old maxim of mine that when you have excluded the impossible, whatever remains, however improbable, must be the truth.
They describe a similar process and include a script that you can use to quickly filter syslog messages. Its premise is that log messages from a normal ay of operation can be quickly identified by looking at the messages over the course of several days. Then discard those common messages. The resulting messages are the ones that are unique to the current day. They include a script that does this processing. [Note: I’ve not looked at Splunk recently to see if they offer that kind of functionality. If they don’t, it would be a good feature to add.]
Both of these systems match the log processing paradigm that Pete and I have discussed in the past and even offer tools to aid in log processing. If you’re running a Cisco infrastructure you can use the EEM function (TCL in IOS) to generate custom events and do a lot more than what’s already provided by Cisco. For me, event management and notification is the first and most important function in a network management framework.
Re-posted with Permission
NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html