SNMP is a pretty reasonable system for retrieving device data, but its implementation has a few warts. Well, it isn’t really an SNMP problem as much as it is a MIB deficiency. There are a few variables in some MIBs that try to provide information about problems that are occuring but the lack of additional data makes them pretty useless. Let’s take a look at a couple of examples and I’ll make suggestions for how future MIB designers can provide better data in the future.
The first variable is ipOutNoRoutes, which appears in the IP-MIB. Its description says:
“The number of IP datagrams discarded because no route could be found to transmit them to their destination. Note that this counter includes any packets counted in ipForwDatagrams which meet this `no-route’ criterion. Note that this includes any datagrams which a host cannot route because all of its default routers are down.”
Knowing about destinations with no route could be useful in troubleshooting. But there are a couple of problems with this variable. In the Cisco code, it counts packets where there are ARP failures as well as routing failures. This isn’t necessarily a fatal flaw as it could point to end stations that are attempting to access a device that has been moved. The real problem is that to diagnose which systems are sending the packets, you have to do packet capture. The designers should have included a 1-entry cache of the source and destination IP address of the packet that caused the counter to increment. Even if you missed a large number of values, you’d capture valuable data that you could use to diagnose a variety of network problems without resorting to packet capture.
The ipFragOKs object could use similar instrumentation. I’d like to know the source/destination where fragmentation is being required. Something isn’t configured correctly to force fragmentation. Maybe it is an old end node that needs to have its MTU adjusted, or perhaps it’s an app that used to work fine over T1 or Frame Relay and now requires fragmentation to run over a VPN. The various ICMP message counters (icmpOutDestUnreachs, icmpOutTimeExcds) have similar problems.
The end result of each of the above items is that the network runs slower and is less efficient. If performance impacting configuration problems can be addressed, the cost of IT can be reduced. Note that what I’m describing may sound like pure performance problems, but the end result is that the configuration of some network device or end node is not correct, causing problems for other devices on the network.
The above is what I call “system level analysis,” which is determining from a collection of data that a configuration or implementation problem is occuring and its impact on the business.
-Terry
_____________________________________________________________________________________________
Re-posted with Permission
NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html