Monitoring Critical Devices & Interfaces

Author
Terry Slattery
Principal Architect

How do you monitor the critical components of your infrastructure?  Let’s say that your core consists of 75 key devices, plus the links that interconnect them.  It makes sense to monitor device status as well as the links between them.  If each system has three or four links, that means that you have to configure 300 to 375 elements in your NMS (Network Management System).  Ok, so you do 25 per day and they are all configured in about three weeks of work.  If your NMS has a way to quickly enter the element information, then you might be able to accomplish it sooner.  Then your boss asks about all the distribution and access layer network devices, because an important person lost connectivity at some time in the recent past and he doesn’t want to have another conversation with that person about network outages.  But there are hundreds of these devices, plus their links to other network devices.  Now you’re looking at more than 1000 elements to add to your NMS.  Suddenly, the task of maintaining the management system using manual methods no longer looks so good. At the same rate of 25 per day (after all, you’d like to do something else every day than add devices to the NMS), it would take two months of work to implement.  Maybe you could hire an intern for the summer to do the incredibly boring job and trust that the intern is dilligent and gets them all right.  I doubt that the job will be accomplished without a few major errors, which then compromises the integrity of the NMS.

Let’s look at an alternative approach to manually configuration of the NMS.  If you’re running a nearly homogeneous network infrastructure, the network devices will all know about each other and the core devices will probably have a common addressing and/or naming scheme.  A good NMS will have an auto discovery mechanism that easily identifies the core devices based on addressing or naming.  CDP or LLDP between devices can be used to identify neighbors and the links that connect them.  So the NMS should be able to automatically identify critical devices and links.  You immediately saved yourself 2-1/2 months of NMS configuration work for a network of this size.

With up-front savings that big, you’ve already made a big dent in the ROI needed to justify a smart NMS versus one that requires manual configuration.  The added benefit over the long run is that the smart NMS can self-maintain the list of key devices and links, saving you even more time.

How should this mechanism work?  I like NMS products that allow me to enter the IP address range for discovery.  Network discovery should work from known devices and look for CDP/LLDP neighbors or routing neighbors for Layer 3.  Layer 2 can be discovered by looking for CDP/LLDP data, trunk links, and interfaces with many MAC addresses on them.  There’s a downstream switch or hub on those interfaces.  Then allow me to specify device groups by device name or address.  The address should be any address on the discovered device, just so I don’t have to have a specific interface type configured with a specific address.

Similarly, an interface group could be defined to be any interface that falls within a certain set of IP address ranges or that connect to key devices.  It would be really cool to specify an interface grouping to allow identification of links that connect core devices to each other and another interface group that contains interfaces that connect the core to distribution or the distribution to access.  Each group could have its own severity levels for link problems and events.  With automated management of the device and interface groups, I’ve eliminated the human error component of managing a critical part of my infrastructure.

The human error happens all the time.  I was recently at a customer where the following conversation was overheard:

Ted: “John, don’t you have the firewalls in the NMS?”
John: “No, I was told to not do anything with the firewalls.  I thought the NOC was monitoring them.”
Ted: “I just checked with the NOC and they aren’t monitoring them.”
John: “How long has it been like this?”
Ted: “At least two months.”
John: “I’ll get them added right now.”
About 30 minutes later… John: “Phew!  They’re all added!”

I recall thinking that 30 minutes was a long time to add two devices.  Maybe they had a lot of interfaces.

When you’re looking at NMS products, ask about automatic device and interface grouping and how the NMS can make your life easier instead of consuming more of your time.

-Terry

_____________________________________________________________________________________________

Re-posted with Permission 

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html

infoblox-logo

Leave a Reply

 

Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.

 

Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.

 

John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.