I’m speaking on an Interop panel about network management. Eric Siegel, an analyst at The Burton Group, is moderator of the panel and the topic is: “The Future of Network Management: Over Niagara Falls In a Barrel.” There were over 100 people in the room, nearly standing room only, which is indicative of the importance of the topic.
It was interesting to watch the audience during the hour long session. When we were discussing general themes, people were relaxed, with a couple of them taking notes. However, when someone asked for specifics about the types of problems and I began listing the most frequent problems NetMRI finds, people became more alert and more of them started taking notes. It is clear that people want to know specific things that make their networks more efficient and reliable.
The most important source of problems is tracking configuration changes since it is responsible for 50% – 80% of all network problems (depending on the analyst reporting it). Sometimes you need to be able to review changes from weeks ago since a problem may exhibit itself only when some other, seemingly unrelated, event occurs.
The next set of problems were more technical. Switch port duplex mismatch is important when it occurs on high utilization interfaces. We’ve seen interfaces running 4,000,000 packets per day that have 30% packet loss. This volume indicates a server and with that number of packets being lost, TCP is never able to get past its slow-start phase, so performance is terrible.
The other problems to check include proper bridge priority specification of the root bridge in each spanning tree, redundancy groups (HSRP or VRRP) with two routers in each group, switch trunk port down or router interface down but configured administratively up, and changes in routing or Layer 2 neighbors.
The above list is but a small set of the types of problems that NetMRI identifies every time it analyzes the data it has collected from the network. It is things that network engineers know need to be checked regularly, but is seldom done due to lack of time or the boring nature of the data that must be collected and analyzed.
-Terry
_____________________________________________________________________________________________
Re-posted with Permission
NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html