Network SLAs – Which one to use?

Terry Slattery
Principal Architect

Scott Hogg just did a post on his Network World Blog that mentioned SLAs (Service Level Agreements).  It is a very timely article for me because I’m working with a customer who wants to define network SLAs.  The task becomes one of selecting an appropriate SLA for the organization.

Good SLAs will have several basic characteristics:

  1. Measures key network parameters that are important to the organization.
  2. The data necessary to create the SLA can be automatically collected with the NMS tools in place.
  3. Are understood by the people who are managing the network.

Let’s look at an example.

Scott suggested an SLA that measures network downtime.  An SLA of five-nines (99.999% availability) results in 5.25 minutes of downtime per year. But how is such an SLA generated?  I can think of several measurement methodologies that result in very different figures.  Let’s use a sample network that has two data centers and twenty remote sites.

In the simple case, all network downtime is accumulated, even if it affects only a portion of the network.  A small remote branch outage is counted the same as an outage that takes out one of the data centers.  Another methodology averages downtime across sites, with the result that the failure of one remote branch has less of an impact on the overall SLA value.  Suddenly, the SLA metric is vastly improved simply by modifying the calculation that’s used.  Finally, a third calculation methodology would measure average device downtime.  Since the failure of a remote site is typically due to one device or link failure, the average of downtime across all devices would create an even smaller SLA figure than the previous two methods.

Which methodology is best for an organization?  It depends on the business.  You want the measurement to reflect those factors that have the greatest impact on the business.  If all the remote sites have to be connected to a data-center at all times, the first methodology is best.  The second methodology is good if the overall average remote site availability is more important than having every site up all the time.  Organizations that use site availability averages often have the capability for a remote site to run in detached mode for short periods of time – until the link to the data-centers can be repaired.  The third calculation might be used by an organization that has critical end systems that need high availability of their attached network devices.

Of course, there are many more SLA calculation methods and source data.  In a VoIP network, it would be useful to incorporate delay, jitter, and packet loss stats into an SLA.  The result might have to be a multi-faceted SLA in which there are several reported figures, each of which focuses on how the network supports a specific part of the business.



Re-posted with Permission 

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under


Leave a Reply


Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.


Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.


John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.