Digital Experience (DX) Monitoring – Intermittent BGP

NetCraftsmen®

Challenge

Employees of a NetCraftsmen client reported highly variable performance of applications, particularly file transfers between two global locations. It was intermittent, with performance being acceptable at some times and highly dissatisfying at other times. Productivity was impacted across all applications that operated between the two sites. It was very frustrating for the staff. Sometimes operations would complete quickly and other times those same operations would seem to take forever. 

A detailed analysis of the reports concluded that it was only between the two specific sites and that it affected all applications and users. The two sites were connected over an internet VPN, which suggested that the problem was likely to be somewhere in the internet. Network diagnostic tools like ping and traceroute could not identify the cause. We needed something that provided more detailed diagnostics and analysis. 

STRATEGY 

NetCraftsmen decided to implement a WAN diagnostic system that would run continuously to capture evidence of the problem. It had to analyze network performance between servers in data centers at the two sites as well as to the enterprise’s staff endpoints (laptops and tablets). The diagnostics had to capture performance data from the internet to the servers in each data center individually, so that we could identify whether the problem was affecting only one data center or both. We also needed to gather data on the performance between staff endpoints and the data centers, so that we could identify problems with those network paths.  

The diagnosis needed to collect data on paths within and between data centers, paths from the internet to each data center, and paths from staff endpoints to data centers. In particular, it needed visibility on a hop-by-hop basis. Then the data had to be correlated between the monitored paths to identify the performance problem’s location. 

SOLUTION  

NetCraftsmen worked with a number of vendors with product capable of decoding modern application delivery chains and chose to build a DX offering using Catchpoint instrumentation. 

Catchpoint has lightweight agents that are loaded on the data center servers and staff endpoints. Catchpoint also has internet-located data collection nodes that could provide visibility into internet performance to each data center as well as with staff endpoints. A cloud-based management system makes it easy to configure all the tests and to correlate the results.  

We started the analysis on a Monday morning and let it run for 48 hours. By Wednesday morning, we had identified significant packet loss every 10-15 minutes on one link within an internet exchange carrier site. The analysis found that Border Gateway Protocol (BGP) network path information was changing periodically and there was high packet loss whenever the path transitioned with one ISP. The clients direct ISP endpoint paths were fine. The point of that packet loss was upstream from the client’s ISP and on further investigation was tied to the BGP interactions impacting the clients address space between upstream providers (with which our client did not have any contracts or method to enforce an SLA). 

In the screen captures below we are using an example from Catchpoint to ensure privacy for the client.  Figure 1 illustrates the view from a number of backbone nodes into a sample datacenter to illustrate the process.  

Figure 1

 

Once we had evidence of the packet drops, we used the tool to take a BGP Autonomous System (AS) view and identified the ISP that was flapping on the clients IP routes that was causing the issue. 

Again, for reasons of confidentiality the data shown is just for illustration. 

Figure 2

RESULT 

The evidence was clear. Our analysis provided enough information for the Clients ISP to identify the problem. The ISP took about a week to achieve full resolution with its upstream peer 

Soon after our client reported that the problem was resolved. 

NetCraftsmen is adept at solving challenging and complex network problems Bring us your lingering and challenging network problems. We’ll help you resolve them so that you can Rest Assured®. 

 

Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.

 

Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.

 

John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.