One of the neat things about talking with VCs is meeting really smart people. During one of our meetings we were talking with Mike O’Dell, a long-time friend and associate who is one of the smartest people I know. We were describing what NetMRI did (identify network devices, collect data from them, then analyze the data and identify exceptions to industry best practices and identify operational problems in the network).
After much discussion, he finally leaned back in his chair (Mike’s body language signaling understanding of some import) and stated something like:
“I get it now! You’re like network fire codes vs fire alarms. Identify potential problems before there’s a fire.”
We then continued the discussion on how fire codes help prevent fires by identifying best practices that avoid problem scenarios. For example, improper electrical wiring causing a fire, or too many people in a room making it impossible to evacuate them in a safe and timely manner in case of a fire.
I’ve used that analogy since then and people understand it. I’ve always wanted a network management system that would automatically collect the data I knew need to be collected and analyze it the way senior network engineers do, much like fire code inspectors check buildings for fire hazards.
The system I specified would identify potential problems that create or contribute to network outages and poor performance if not rectified. After looking around at existing products in late 2002, I decided that since it still didn’t exist, I’d try my hand at building one. As a result, NetMRI looks at several categories of problems:
- incorrect or incomplete network configuration and device configuration (redundancy configured in a device, but no backup device exists);
- improper deployment (device configurations that don’t match the corporate policies and configuration templates);
- operational problems (duplex mismatch on an important server interface or dropped packets in VoIP calls).
- filter megabytes of syslog data to identify important events as they happen (device/interface down events, important routing protocol events, or environmental notifications like an over-temperature situation).
I’m interested in any comments you have regarding the fire codes analogy and the types of analysis that you’ve always wanted out of a network analysis tool.
-Terry
_____________________________________________________________________________________________
Re-posted with Permission
NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html