Click here to request your free 14-day trial of Cisco Umbrella through NetCraftsmen today!

When using NetMRI in consulting engagements, we are often asked which of the NetMRI issues are the most important to track. That’s a relatively easy question to answer and really doesn’t depend on whether a NetMRI is in use or not. Regardless of the tools, we want to track the same things. In the lists below, many of the issues are obvious, so I’ll skip an explanation. I will elaborate on the items that I think may not be obvious.

Most networks today support business functions that are critical to the ongoing operation of the business. The first issues to track are environmental, because they are the ones that fail more frequently.

  • Power supply failure, including loss of input power on redundant supplies.
  • Fan failure, causing a device to overheat.
  • High temperature, possibly due to a fan failure, or perhaps due to an HVAC system failure.
  • Power supply voltage out of range, which might not cause an overall power supply failure, or due to high temperatures.

We rarely see networks that don’t have some level of redundancy, so it is important to look for failures within the redundant systems. Router redundancy failures and key interface failures are at the top of the list here. (I mentioned these in last week’s blog post New Year Resolution: Run a Clean Network and include them here for completeness.)

  • HSRP, GLBP, VRRP where there is only one router in the redundancy group. An interface could have failed, or the redundant device has failed. Or the redundant device may have not been installed or properly configured. In any case, your intended redundancy doesn’t exist.
  • Router interface down – all router interfaces should be up/up or admin down or a failure has occurred, implying that any interface that isn’t used should be shutdown.
  • Switch trunk ports down – similar to router interfaces – trunk ports are often infrastructure interconnections and should be admin down if they are not in use.
  • Config not saved, while not specifically a redundancy issue, will be a problem if the device dies or is rebooted. The result could be an outage until the config is rebuilt. Saving the current running configuration and creating a notification that it was not saved in NVRAM provides the necessary notification that the device won’t come back up to the current operating state upon a reboot.

The Router Interface and Switch Trunk Port down issues mentioned above are particularly important because they are much easier to overlook. Most organizations don’t take the time to shutdown an unused interface or to remove an old description, making it difficult to tell whether a down interface is due to a link failure. It is easy to miss a key interface failure. An outage occurs later (often much later) when the redundant interface goes down. The best way to manage the network is to shutdown each unused router interface and switch trunk port if it is not used. Then any interfaces or trunk ports that are found in up/down state are due to a failure and should be corrected.

Then we start looking at performance related issues. Performance is typically where most people start looking at networks, because the tools have existed for a long time to look at network performance. What’s often not obvious is how to identify high utilization during business hours.

  • High 95th percentile utilization, when calculated over a daily period, identifies interfaces that are running at the reported utilization or greater for 72 minutes of the day. If most of the high utilization time is during prime business hours, it may be affecting business productivity.
  • High errors or discards identifies interfaces that are having problems, either due to a poorly operating link (errors) or due to network congestion (discards). The impact on the business is lower productivity.
  • Duplex mismatch, which typically results in a specific set of network errors. Any interface that begins running at high utilization will experience more errors as a result of duplex mismatches. Utilization over about 10% of link capacity will start to see errors, with the errors increasing as link utilization approaches 30%. As with network errors, it results in lower business productivity.

Once the above issues are being addressed, configuration consistency is next on the list. To check configuration consistency, the network management system will need tools to allow you to identify configurations that don’t match your configuration templates. This is more than checking that a config has certain statements. It needs to be able to handle statements that must appear in a certain order (think ACLs here). It must also be able to identify configurations that contain some statements but not other statements (e.g., make sure the ACL hasn’t been extended or make sure that an undesirable routing protocol is not configured).

With the above checks and alerts, you are well on your way to handling the majority of common network problems, making your network much more stable. In a redundant network, you’ll have the ability to correct most network problems before they cause an outage, and that’s what’s important in a smoothly operating network.

-Terry

_____________________________________________________________________________________________

Re-posted with Permission 

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html

infoblox-logo
Terry Slattery

Terry Slattery

Principal Architect

Terry Slattery is a Principal Architect at NetCraftsmen, an advanced network consulting firm that specializes in high-profile and challenging network consulting jobs. Terry is currently working on network management, SDN, business strategy consulting, and interesting legal cases. He is the founder of Netcordia, inventor of NetMRI, has been a successful technology innovator in networking during the past 20 years, and is co-inventor on two patents. He has a long history of network consulting and design work, including some of the first Cisco consulting and training. As a consultant to Cisco, he led the development of the current Cisco IOS command line interface. Prior to Netcordia, Terry founded Chesapeake Computer Consultants, which became a Cisco premier training and consulting partner. At Chesapeake, he co-invented and patented the v-LAB system to provide hands-on access to real hardware for the hands-on component of internetwork training classes. Terry co-authored the successful McGraw-Hill text "Advanced IP Routing in Cisco Networks," is the second CCIE (1026) awarded, and is a regular speaker at Enterprise Connect and Interop. He currently blogs at TechTarget, No Jitter and our very own NetCraftsmen.

View more Posts

 

Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.

 

Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.

 

John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.