How to Monitor Application Performance in the Cloud

Terry Slattery
Principal Architect

As you move real-time applications to the cloud, you might relinquish some network control. However, several mechanisms are available to monitor application performance.

In general, with real-time applications, a person waits for and uses results from the application. Real-time apps include obvious things, like voice, video and screen sharing, but they also include highly interactive business processes and monitoring systems. To some extent, real-time apps also include streaming video, although buffering can help smooth out the delivery of video streams.Real-time apps’ reaction to common network problems is dependent on the application and the data transport mechanism that is used. Interactive voice and video are relatively tolerant of up to 1% of randomly distributed packet loss. The applications use User Datagram Protocol packet streams, and the endpoint codecs use interpolation to estimate the values of data in individual lost packets.On the other hand, Transmission Control Protocol(TCP) applications — like most interactive business applications and streaming voice and video — are very susceptible to packet loss. More than 0.0001% packet loss has a significant effect on the throughput of an application that uses TCP as its data transport mechanism.

Bursts of packet loss will cause a streaming application to pause while the lost data is retransmitted by the sending system. The result is the application pauses while it buffers data.

Packet loss is typically due to link errors or congestion. Link errors indicate something needs to be investigated and corrected. Congestion is due to link speed differences or aggregation points. Link speed differences occur when data goes from a higher-speed link to a lower-speed link, such as transitioning from a 10 Gbps data center fabric link to a 1 Gbps office or WAN link.

Monitor application performance and pinpoint packet problems

Another source of congestion is at aggregation points where many lower-speed links connect to a router or switch that has one or two higher-speed uplinks. A burst of traffic from multiple end systems could all arrive at the higher-speed link at nearly the same time, potentially overrunning the interface buffers and causing a burst of packet loss.

High jitter has its greatest effect on interactive voice and video. It is caused when real-time packets are queued behind multiple large packets. The real-time traffic must wait its turn, resulting in large variations in latency.

When jitter gets too high, voice and video packets simply arrive at the receiver too late to be passed to the codec for playback at the proper time. The voice endpoints incorporate some buffering to help reduce the effect of jitter, but it is limited in its ability to handle high jitter. As a result, high jitter looks like packet loss.

High jitter can cause packets to look like they’ve been dropped. High congestion is packet loss due to congestion with other applications and data flows. You can track these problems by looking for interfaces with high drops. Use Top-N 95th percentile of drops to identify interfaces with significant problems. If you find interfaces with many, many drops, it’s an indication the link is oversubscribed and needs less traffic or more bandwidth.

A high level of errors is easy to track by looking at interface statistics and could indicate a physical layer problem.

An analysis of real-time traffic can indicate your monitoring system should look for several sources of impairment, including link errors, high jitter and congestion-induced packet loss, due to link speed differences or aggregation points.

You don’t have access to the physical interfaces in the cloud, so it’s not possible to monitor interface errors or interface drops. Instead, you must look for application impairment using other mechanisms.

Passive monitoring of application performance

Cloud infrastructure providers may provide packet capture (pcap) mechanisms. An alternative is to determine if your virtual appliances, such as firewalls and switches, provide pcap technology. Look for the ability to export pcap files, allowing you to examine packet traces using a variety of tools.

Some applications provide good internal diagnostics that can help identify the network problems that are affecting their operation. For example, voice and video endpoints can use the Real-Time Transport Control Protocol (RTCP) to report packet loss, jitter and round-trip times during the call. This information can be used to discern whether a problem is with a specific endpoint, a group of endpoints, a region or systemwide. A bit of sleuthing may be required to identify the cloud network infrastructure that is causing a problem.

Alternatively, you can monitor application performance if parts of the application traffic flows are in a spot where you can place a physical or virtual appliance. These systems become more powerful as the breadth of the monitored infrastructure increases. Ideally, all tiers of a multi-tier application will be monitored, allowing the tool to identify both network and non-network problems that affect application performance. Some of these tools can import pcap files for their analysis.

Testing your network infrastructure

Active path testing tools have certain advantages that make them attractive. First, there are several types of active path testing:

  1. Synthetic transactions. Create real transactions. For example, place a call between specific endpoints to ensure the call controller is functioning correctly, as well as validating the data path between endpoints.
  2. Simulate application traffic. Test probes exchange packets that match the application, but carry a diagnostic payload, like packet counter and timestamp, to measure path characteristics. This requires probes to be distributed throughout the infrastructure, and it doesn’t add load to the application systems.
  3. Standard network diagnostics. No special capabilities are needed. Traceroute may provide path information that is not visible via other tools. Test probes often provide this capability, in addition to the other two types of tests.

Active path testing allows the network infrastructure to be tested when the critical applications are not running or when they are not being used. It is useful for early problem identification and for collecting information about intermittent problems. The combination of standard network diagnostics with synthetic transactions or simulated traffic can provide visibility into the infrastructure that is not available with other tools.

Sorting through the data

Network management tools can provide an overwhelming amount of data. Data averaging over hours or a day can hide problems due to long durations of low values. Instead, use sorting functions like the 95th percentile to identify items that have bursts of problems. It is especially useful for filtering interfaces and links that have bursts of high packet loss. A concise report of the top 10 instances in each problem category allows you to focus on the most problematic instances.

You don’t need all of the above tools to get started. Use what is available and get started. Think about the complete application infrastructure, what tools you have available and what you can get from those tools to monitor application performance. A little resourcefulness will go a long way.

To read the original blog post, view TechTarget’s post here.

Leave a Reply


Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.


Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.


John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.