Diagnosing the “ipOutNoRoute” Counter

Any devices that implement the IP MIB have an interesting SNMP counter called ipOutNoRoutes that has the following definition:

ipOutNoRoutes OBJECT-TYPE
  SYNTAX Counter
  ACCESS read-only
  STATUS mandatory
  DESCRIPTION
    "The number of IP datagrams discarded because no route could
    be found to transmit them to their destination.  Note that
    this counter includes any packets counted in ipForwDatagrams
    which meet this `no-route' criterion.  Note that this
    includes any datagrams which a host cannot route because all
    of its default routers are down."

NetMRI tracks this counter and reports when a router or L3 switch has a high number of them in 24 hours.

I’ve investigated the actions that drive this counter because it sounds like an interesting object to track. The description implies that the counter increments when the router, or the routing function in a L3 switch, is not able to forward a packet because a forwarding lookup failed. Fred Baker at Cisco tells me that it is also incremented when an ARP lookup fails. So the counter increments when a L3 lookup fails and when an L2 resolution fails. That confuses the purpose of the counter, at least to my mind. I would expect L3 counters to not include L2 event counts and vice versa.

I’ve previously described how to improve SNMP MIBs to make it easy for an NMS to report to the network staff which endpoints are causing problems that a counter like this is reporting.

Practically speaking, getting the MIB updated or creating a new MIB and getting it supported in network gear is not going to happen anytime soon. It typically takes several years to get agreement on a new MIB. Add to that the years that it takes vendors to get the new functionality built in their products and rolled out to their entire product line. How do we determine the source of the problem when we find an error counter that’s exceeding normal thresholds like the ipOutNoRoutes, icmpOutDestUnreachs, or icmpOutTimeExcds?

NetCraftsmen co-worker Marty Adkins had a good diagnostic suggestion for Cisco devices: ‘debug ip icmp’. His rationale is that ICMP messages are typically very low volume and that enabling debug for ICMP packets would not cause CPU overload on the router. Of course, it helps to use ‘no logging console’ and ‘logging buffered’ to keep the CPU load to a minimum, as described in Cisco’s Important Information on Debug Commands.

I used Marty’s procedure at a customer site and it works. A router (it could have been a L3 switch) had a high ipOutNoRoutes counter for each day. I enabled ‘debug ip icmp’ and found that several hosts on an attached segment were attempting to send data to the 10.10.10.255 address, which is the local broadcast address for a 10.10.10.0/24 subnet. But the router interface was configured with 10.10.10.227/28, which is an address in subnet 10.10.10.224/28. The router was not configured with proxy arp, which is rarely needed or used these days, so its only option was to drop the packets, increment the ipOutNoRoute counter, and return an ICMP Host Unreachable message. I spotted it because 10.10.10.255 is the 10.10.10.0/24 subnet broadcast address. Checking the router interface configuration quickly showed me that there were no interfaces in that subnet and that the interface reporting the bad packets was the 10.10.10.224/28 subnet. Instant problem identification!

The misconfigured hosts were not attempting to communicate with hosts in any of the other subnets contained within 10.10.10.0/24. If they were they would have created an interesting troubleshooting scenario. Connectivity to the router and to the other members of the 10.10.10.224/28 subnet would have worked. But communication with any other host on any other subnet within the 10.10.10.0/24 range would have failed because the ARP would fail, creating a routing black hole for those destinations from the incorrectly configured hosts. Ping tests would confirm which destinations would respond and which would not. The correct diagnosis would require that someone spot the incorrect mask configuration on the hosts. There would not have been any good tips about the source of the problem, such as the failure of packets to 10.10.10.255.

This type of problem would be difficult for most network engineers to solve without more data. I’ve occasionally heard of people describing “bad” network segments and that they have never been able to get working correctly. These are most often due to some easily explained problem which has never been important enough to spend the time to determine the true cause. Perhaps you know of similar problems.

-Terry

_____________________________________________________________________________________________

Re-posted with Permission

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html

Re-posted with Permission

Leave a Reply

Related Topics