I just posted a blog at nojitter.com titled Monitoring a Software Defined Network, Part 1. In it, I discuss the basic network monitoring functions that will be needed. Most of these functions are no different than those that have historically been required. I then discuss some additions that will be very useful for determining the source of a problem, or at least determining which systems are affected. While creating NetMRI’s functionality, I often found that I could often detect an error, but that sometimes there was insufficient information available to determine the systems that were involved or determine the root cause of the error.
Most of these errors are where a packet makes it part of the way across the network before an error is detected. The errors tend to be failures in the forwarding systems, causing the packet to be dropped. For example, TTL Exceeded causes a packet to be dropped and an ICMP message generated to the source of the packet. But it would be extremely valuable for the network management system to know that this error has occurred and why it was generated. Was it due to a traceroute being run, or was it due to a routing loop? In today’s networks, we can only look at the total count of TTL Exceeded errors and try to determine how many errors might indicate that a periodic routing loop exists. It would be extremely valuable to be able to retrieve the header information of the failed packet. Knowing the source and destination would be very valuable in the failure analysis.
Since SDN is relatively new, it may be possible to get some very valuable information storage defined that would facilitate troubleshooting many common problems. I would very much like to see that we use this opportunity to integrate valuable network monitoring and management functions in SDN, so that our networks become more reliable.
-Terry