VoIP troubleshooting is a pretty interesting problem. (See my earlier blog entry regarding my talk at VoiceCon Fall 2007.) There are a large number of tools that look at packets on the wire, or look at the RTCP (Real Time Control Protocol) stream from certain phones, or gather data from the call controllers. While many of these approaches can identify that a problem exists, I’ve not encountered any, other than NetMRI, that also look at the infrastructure devices (routers and switches) to try to identify the real source of problems.
While it is somewhat useful to identify that a problem exists, it is still a lot of work to identify the source of the problem. For example, knowing that high packet loss occurrs on a particular set of voice calls is useful. However, identifying the source of the packet loss requires more work. It could be due to a bad link, duplex mismatch, using the wrong codec on a call leg, or improper QoS configuration. If the call path is long, there are many points to check. Each check will take several minutes to several hours to manually collect and analyze the data at each point in the path necessary to validate the correct operation or that a potential source of the problem has been detected. Multiply by the number of points in the path and you have a lengthy and tedious troubleshooting session.
What’s needed is something akin to the analysis computer in modern automobiles. In the network, the infrastructure’s configuration should be regularly validated against the design policies to verify that configurations contain the correct QoS configuration and default duplex settings. You can think of this as the “seatbelt not buckled” warning.
The network’s operational parameters should be checked to make sure that things like duplex settings negotiated properly. Another operational parameter is to verify that the QoS queue bandwidth is not exceeded, resulting in dropped packets, all because a high bandwidth codec was inadvertently used or that too many concurrent calls are occurring. Warnings on operational exceptions are like the “check engine” light — something is amiss in the system’s operation.
While it may sound like an advertisement, network analysis and improving network efficiency like that described above is our goal in creating NetMRI. I’ve not seen anything in the market that performs the same breadth of analysis in one package as exists in NetMRI. If you have, please leave me a comment.
-Terry
_____________________________________________________________________________________________
Re-posted with Permission
NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html