Monitoring Critical Devices & Interfaces

Author
Terry Slattery
Principal Architect

How do you monitor the critical components of your infrastructure?  Let’s say that your core consists of 75 key devices, plus the links that interconnect them.  It makes sense to monitor device status as well as the links between them.  If each system has three or four links, that means that you have to configure 300 to 375 elements in your NMS (Network Management System).  Ok, so you do 25 per day and they are all configured in about three weeks of work.  If your NMS has a way to quickly enter the element information, then you might be able to accomplish it sooner.  Then your boss asks about all the distribution and access layer network devices, because an important person lost connectivity at some time in the recent past and he doesn’t want to have another conversation with that person about network outages.  But there are hundreds of these devices, plus their links to other network devices.  Now you’re looking at more than 1000 elements to add to your NMS.  Suddenly, the task of maintaining the management system using manual methods no longer looks so good. At the same rate of 25 per day (after all, you’d like to do something else every day than add devices to the NMS), it would take two months of work to implement.  Maybe you could hire an intern for the summer to do the incredibly boring job and trust that the intern is dilligent and gets them all right.  I doubt that the job will be accomplished without a few major errors, which then compromises the integrity of the NMS.

Let’s look at an alternative approach to manually configuration of the NMS.  If you’re running a nearly homogeneous network infrastructure, the network devices will all know about each other and the core devices will probably have a common addressing and/or naming scheme.  A good NMS will have an auto discovery mechanism that easily identifies the core devices based on addressing or naming.  CDP or LLDP between devices can be used to identify neighbors and the links that connect them.  So the NMS should be able to automatically identify critical devices and links.  You immediately saved yourself 2-1/2 months of NMS configuration work for a network of this size.

With up-front savings that big, you’ve already made a big dent in the ROI needed to justify a smart NMS versus one that requires manual configuration.  The added benefit over the long run is that the smart NMS can self-maintain the list of key devices and links, saving you even more time.

How should this mechanism work?  I like NMS products that allow me to enter the IP address range for discovery.  Network discovery should work from known devices and look for CDP/LLDP neighbors or routing neighbors for Layer 3.  Layer 2 can be discovered by looking for CDP/LLDP data, trunk links, and interfaces with many MAC addresses on them.  There’s a downstream switch or hub on those interfaces.  Then allow me to specify device groups by device name or address.  The address should be any address on the discovered device, just so I don’t have to have a specific interface type configured with a specific address.

Similarly, an interface group could be defined to be any interface that falls within a certain set of IP address ranges or that connect to key devices.  It would be really cool to specify an interface grouping to allow identification of links that connect core devices to each other and another interface group that contains interfaces that connect the core to distribution or the distribution to access.  Each group could have its own severity levels for link problems and events.  With automated management of the device and interface groups, I’ve eliminated the human error component of managing a critical part of my infrastructure.

The human error happens all the time.  I was recently at a customer where the following conversation was overheard:

Ted: “John, don’t you have the firewalls in the NMS?”
John: “No, I was told to not do anything with the firewalls.  I thought the NOC was monitoring them.”
Ted: “I just checked with the NOC and they aren’t monitoring them.”
John: “How long has it been like this?”
Ted: “At least two months.”
John: “I’ll get them added right now.”
About 30 minutes later… John: “Phew!  They’re all added!”

I recall thinking that 30 minutes was a long time to add two devices.  Maybe they had a lot of interfaces.

When you’re looking at NMS products, ask about automatic device and interface grouping and how the NMS can make your life easier instead of consuming more of your time.

-Terry

_____________________________________________________________________________________________

Re-posted with Permission 

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html

infoblox-logo

Leave a Reply