Rick Burts (CCIE #4615) of Chesapeake NetCraftsmen told several of us about an interesting problem he recently encountered. Here’s his description of the problem:
This week I was called to assist [a customer] with a problem in their network. The problem was tricky and very intermittent. It impacted some users but not others. Impacted users could access certain destination addresses but could not access other destination addresses within the same subnet.
It felt like a routing issue. But since some addresses within a subnet were reachable while others within the same subnet were not reachable it was hard to see how routing was the issue. After extensive troubleshooting we diagnosed that it was a problem within an Etherchannel configured between a couple of switches. Each switch was configured with four ports in the Etherchannel. One switch showed three ports active in the Etherchannel and one port not connected. The other switch showed two ports active in the Etherchannel and two ports not connected.
So the problem was that one switch saw a port as up while the other switch saw the port as down. No one was sure how the port was marked as not connected. The problem was exacerbated by the fact that all ports in the Etherchannel were configured with mode on and by the fact that UDLD was not configured. We got the port back to an upstate and connectivity was restored for all destination addresses. We also configured the ports to mode desirable and enabled UDLD. It was a very interesting problem to figure out.
The core of the problem was that the switches were not configured properly. If the customer didn’t know the proper switch configuration to use, it is unlikely that they would have built a configuration policy check would have found the error. That’s where policies based on the Cisco SRNDs (Solutions Reference Network Design guides) are useful.
And let’s say that the network engineers didn’t have a system to check network configuration policy or that they weren’t aware of the current configuration best practices for Etherchannel. A network analysis system would have identified that some links in the Etherchannel were not operational, potentially identifying the problem before the users were affected. This is an example of common problems that senior network engineers know that they should check, but they never have the time to proactively seek. That’s where automated network analysis, of both configurations and operational data, is important.
-Terry
_____________________________________________________________________________________________
Re-posted with Permission
NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html