Handling NMS Performance Data, Part 4

Terry Slattery
Principal Architect

In the previous posts (Handling NMS Performance Data, Part 1, Handling NMS Performance Data, Part 2, and Handling NMS Performance Data, Part 3), I’ve described how an NMS can optimize data retrieval and storage.  In this post, the last on the topic, I’ll discuss several additional optimizations that can be implemented.

Request Packing
Some devices can handle more SNMP variable requests per packet than other devices.  It is easy for an NMS to take the approach of using the least common request size.  But that approach is inefficient for the devices that can handle the maximum requests per packet.  For example, if a high-end router can handle 40 requests per packet and a old, low-end switch can only handle 10 requests per packet, it is very inefficient for the NMS to use 10 requests per packet for the high-end device.  The NMS should automatically track the number of requests that a device can handle per packet.  Some NMS implementations require that the administrator make this change, but that’s also inefficient, because it is the NMS that really knows how many requests can be packed per packet, based on devices that return all the data that was requested.  When the NMS doesn’t receive the full set of requested data, it can do a binary search to quickly determine how much to request in a packet.  Another thing that the NMS has in its favor is that the MIB provides hints on the size of the returned data.  For example, it can determine that the request is for an octet string that has a maximum size of 512 bytes.  By implementing an efficient request stream, it is possible to make much more effective use of network bandwidth and request queue processing.

Collect Stats from Active Interfaces
Optimize data collection by collecting data from active interfaces and just checking for state change on inactive interfaces.  In networks that have a large number of switches, it is common for many of the switch ports to be inactive.  In some networks, the ratio of inactive to active interfaces may be 1:3 or more.  The NMS can easily identify those interfaces that are operationally down (e.g., up/down or down/down).  Each interface has either a “last change” timestamp or the operational state variable that the NMS can efficiently check on each polling cycle.  If the interface state has not changed, there is no need to retrieve the interface performance stats.  I prefer using the “last change” timestamp because it can tell whether the interface changed state since the last poll.

Rank Interfaces by Importance
Some interfaces are more important than other interfaces.  Many network management systems require that the administrator identify important interfaces or rank interface importance — a manually intensive process.  This is something that the NMS can help perform, with the administrator performing additional refinement of the ranking.  An interface importance ranking design might have the following rankings:

  1. Core infrastructure interfaces (core device to core device)
  2. Core layer interfaces to distribution layer devices
  3. Core infrastructure interfaces to data center infrastructure
  4. Data center infrastructure interfaces to critical servers
  5. High volume interfaces (typically a server)
  6. Interfaces to critical services (NTP, DNS, DHCP, NMS)
  7. Distribution layer interfaces to access layer devices
  8. Access layer interfaces

Critical interfaces can be identified by determining device connectivity.  CDP, LLDP, Layer 3 addressing, spanning tree tables, switch trunk interfaces, and switch forwarding tables are good sources of data that the NMS can use to automatically rank the interfaces.

Critical interfaces should be polled more frequently than edge interfaces that connect to a lower ranked device like a laptop.   The administrator can provide hints, such as the CIDR blocks in the data centers, or subnets where critical services or critical users are connected.

Use Variable Polling Periods
The vast majority of interfaces in a network are typically near the bottom of the ranking list and can be polled much less frequently than the higher ranked interfaces.  Polling a low utilization interface at a low frequency (e.g., every 15 or 20 minutes) will still provide good visibility into network problems.  A high ranking interface may need its stats collected every few minutes while an edge port to a user workstation may be polled a few times every hour.  Frequent polling of high ranking interfaces provides better real-time visibility into network problems and bursty traffic loads.  The error stats collected from the low ranking interfaces still provide good visibility into whether they are experiencing problems (e.g. duplex mismatch).

A variable interface polling frequency allows the NMS to efficiently handle more interfaces than if all interfaces were polled at the same frequency.  It also allows more important interfaces to be monitored more closely.  Of course, the administrator needs to be able to change the polling frequency on any interface, and some interfaces may need to be polled as often as every few seconds when collecting troubleshooting information.



Re-posted with Permission 

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html


Leave a Reply


Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.


Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.


John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.