Handling NMS Performance Data, Part 4

Author
Terry Slattery
Principal Architect

In the previous posts (Handling NMS Performance Data, Part 1, Handling NMS Performance Data, Part 2, and Handling NMS Performance Data, Part 3), I’ve described how an NMS can optimize data retrieval and storage.  In this post, the last on the topic, I’ll discuss several additional optimizations that can be implemented.

Request Packing
Some devices can handle more SNMP variable requests per packet than other devices.  It is easy for an NMS to take the approach of using the least common request size.  But that approach is inefficient for the devices that can handle the maximum requests per packet.  For example, if a high-end router can handle 40 requests per packet and a old, low-end switch can only handle 10 requests per packet, it is very inefficient for the NMS to use 10 requests per packet for the high-end device.  The NMS should automatically track the number of requests that a device can handle per packet.  Some NMS implementations require that the administrator make this change, but that’s also inefficient, because it is the NMS that really knows how many requests can be packed per packet, based on devices that return all the data that was requested.  When the NMS doesn’t receive the full set of requested data, it can do a binary search to quickly determine how much to request in a packet.  Another thing that the NMS has in its favor is that the MIB provides hints on the size of the returned data.  For example, it can determine that the request is for an octet string that has a maximum size of 512 bytes.  By implementing an efficient request stream, it is possible to make much more effective use of network bandwidth and request queue processing.

Collect Stats from Active Interfaces
Optimize data collection by collecting data from active interfaces and just checking for state change on inactive interfaces.  In networks that have a large number of switches, it is common for many of the switch ports to be inactive.  In some networks, the ratio of inactive to active interfaces may be 1:3 or more.  The NMS can easily identify those interfaces that are operationally down (e.g., up/down or down/down).  Each interface has either a “last change” timestamp or the operational state variable that the NMS can efficiently check on each polling cycle.  If the interface state has not changed, there is no need to retrieve the interface performance stats.  I prefer using the “last change” timestamp because it can tell whether the interface changed state since the last poll.

Rank Interfaces by Importance
Some interfaces are more important than other interfaces.  Many network management systems require that the administrator identify important interfaces or rank interface importance — a manually intensive process.  This is something that the NMS can help perform, with the administrator performing additional refinement of the ranking.  An interface importance ranking design might have the following rankings:

  1. Core infrastructure interfaces (core device to core device)
  2. Core layer interfaces to distribution layer devices
  3. Core infrastructure interfaces to data center infrastructure
  4. Data center infrastructure interfaces to critical servers
  5. High volume interfaces (typically a server)
  6. Interfaces to critical services (NTP, DNS, DHCP, NMS)
  7. Distribution layer interfaces to access layer devices
  8. Access layer interfaces

Critical interfaces can be identified by determining device connectivity.  CDP, LLDP, Layer 3 addressing, spanning tree tables, switch trunk interfaces, and switch forwarding tables are good sources of data that the NMS can use to automatically rank the interfaces.

Critical interfaces should be polled more frequently than edge interfaces that connect to a lower ranked device like a laptop.   The administrator can provide hints, such as the CIDR blocks in the data centers, or subnets where critical services or critical users are connected.

Use Variable Polling Periods
The vast majority of interfaces in a network are typically near the bottom of the ranking list and can be polled much less frequently than the higher ranked interfaces.  Polling a low utilization interface at a low frequency (e.g., every 15 or 20 minutes) will still provide good visibility into network problems.  A high ranking interface may need its stats collected every few minutes while an edge port to a user workstation may be polled a few times every hour.  Frequent polling of high ranking interfaces provides better real-time visibility into network problems and bursty traffic loads.  The error stats collected from the low ranking interfaces still provide good visibility into whether they are experiencing problems (e.g. duplex mismatch).

A variable interface polling frequency allows the NMS to efficiently handle more interfaces than if all interfaces were polled at the same frequency.  It also allows more important interfaces to be monitored more closely.  Of course, the administrator needs to be able to change the polling frequency on any interface, and some interfaces may need to be polled as often as every few seconds when collecting troubleshooting information.

-Terry

_____________________________________________________________________________________________

Re-posted with Permission 

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html

infoblox-logo

Leave a Reply