Click here to request your free 14-day trial of Cisco Umbrella through NetCraftsmen today!

4/9
2015
Terry Slattery

The Network is Down! — Avoiding Network Outages

How can you avoid the words that no CEO wants to hear: “The network is down!”? The most important step is a regular network infrastructure review.

It Can Happen To You

How do you know that your network is not an accident waiting to happen? Just because your existing network has never gone down doesn’t mean that it won’t in the future. Computer networks are complex entities in which multiple protocols and many network elements need to function correctly.

John Halamka, CIO of Beth Israel Deaconess Medical Center in Boston, wasn’t aware of any problems in his network until a spanning tree problem took out the network for four days. The story was chronicled in an award-winning article All Systems Down, which appeared in CIO magazine in 2003. I encourage you to read it to learn what happened and how he handled it.

Ah, you say, “That was 2003! That was 12 years ago! It can’t happen anymore.” I suggest that you now go read “Our bullet-proof LAN failed. Here’s what we learned.” Paul Whimpenny, Senior Officer for IT Architecture in the IT Division of the Food and Agriculture Organization of the United Nations, describes a network outage similar to Beth Israel’s that happened very recently. Fortunately, Paul’s outage was only four hours long.

A known type of failure in a common network protocol caused both outages. Could they have been prevented? Sure. Could the outage time have been reduced? Absolutely. Both outages could have been avoided by doing a periodic network review. Think of it as similar to an audit of the financial systems. There are designs and operational best practices that lead to improved network performance and a reduction in potential network failures. Why wouldn’t you want to use them since they lead to better results? Note that implementing best practices often doesn’t incur a substantially greater cost.

Steps of a Network Infrastructure Review

What can prevent future outages like those described in the articles above? The first step is to do periodic network infrastructure reviews. They are like an annual health checkup or an audit. A review should reduce the risk to the business that a major network failure can occur.

The second step is to implement the recommendations of a review — or at least the most significant findings that create risk of a major failure. At NetCraftsmen, we’ve done a number of reviews where the client then didn’t follow up to correct the most significant problems. Sometimes, the view from the technical staff is “it hasn’t happened yet.” That’s like not carrying automobile insurance because you haven’t yet been in an accident.

Many of the problems we identify in a network review are latent faults that will cause problems only when certain conditions occur. Those conditions will eventually occur. Even if it happens once, as Paul noted in his article about his “bullet-proof” network, the tech staff may say that it can’t happen again. I wouldn’t bet my job on it.

Sometimes we find that the network technical staff is threatened by an outside review. Here, the company management needs to address the review as a regular event, much like a financial review or any other regulatory review. In fact, Sarbanes-Oxley compliance is often justification enough to conduct a network infrastructure review, since the federal law requires that companies implement security best practices for any system related to financial reporting.

What are the Costs?

What does it cost to have a network infrastructure review? It depends on the size and complexity of the network. A small network might be $50,000 while a large, complex network might be upwards of $150,000. An experienced network review team will be comparable in cost to a financial audit team. The result is a comprehensive report that describes the current state of the network, any vulnerabilities that were found, and the risks associated with each. Obviously the most serious or highest risk problems should be corrected as soon as possible.

Another perspective on the cost is the value of avoiding a failure. Examine how much each hour of outage costs the company and estimate how long an outage may last, based on stories like those above. That should provide an approximate number.

Summary

Significant network outages are preventable. Network infrastructure reviews, just like fire code reviews and financial reviews, provide the information that allows company management to understand the risks to the business. For a deeper conversation about what would be involved in a review for your organization, feel free to reach out.

A review is just the first step. It needs to be followed by a program to correct the highest risk findings. The result is greater confidence in the ongoing health of the network and avoiding the words you never want to hear: The network is down!

Terry Slattery

Terry Slattery

Principal Architect

Terry Slattery is a Principal Architect at NetCraftsmen, an advanced network consulting firm that specializes in high-profile and challenging network consulting jobs. Terry is currently working on network management, SDN, business strategy consulting, and interesting legal cases. He is the founder of Netcordia, inventor of NetMRI, has been a successful technology innovator in networking during the past 20 years, and is co-inventor on two patents. He has a long history of network consulting and design work, including some of the first Cisco consulting and training. As a consultant to Cisco, he led the development of the current Cisco IOS command line interface. Prior to Netcordia, Terry founded Chesapeake Computer Consultants, which became a Cisco premier training and consulting partner. At Chesapeake, he co-invented and patented the v-LAB system to provide hands-on access to real hardware for the hands-on component of internetwork training classes. Terry co-authored the successful McGraw-Hill text "Advanced IP Routing in Cisco Networks," is the second CCIE (1026) awarded, and is a regular speaker at Enterprise Connect and Interop. He currently blogs at TechTarget, No Jitter and our very own NetCraftsmen.

View more Posts

 

Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.

 

Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.

 

John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.