I’ve been working with Steve Meyer at NetCraftsmen on Qos policies for a customer where they have voice, interactive video (i.e. Telepresence), video conferencing, streaming video, and business applications. They also have what we call “streaming entertainment” traffic, which is music from pandora.com and videos from Akamai and LimeLight Networks. That’s quite a mix of traffic requirements.
Looking specifically at the requirements for interactive video, we noted that the Cisco TelePresence QoS Network Systems Design Guide) recommends putting TelePresence into the LLQ (Priority) queue along side voice (Branch QoS Design for TelePresence). At first, this didn’t seem right – why put high-bandwidth traffic into the same class as low-bandwidth, loss-sensitive voice? While TelePresence is also loss and jitter sensitive, it is high volume, so why put it into the priority queue? Wouldn’t it suffice in a high-priority, class-based queue? Would putting interactive video into the voice queue affect voice?
After thinking about it, we realized that what hurts voice traffic is something called instantaneous buffer congestion. TCP wants to use as much bandwidth as it can, using slow start to ramp up its utilization to the point where congestion loss begins to occur. The ramp-up of TCP data transfers causes TCP to be very bursty and is the origin of instantaneous buffer congestion. This is normal and expected; however, it is this congestion loss that hurts voice.
Even though there is some variation in video, due to motion and the use of Information Frames or I-Frames, TelePresence is a much more steady-state than TCP. The traffic volume for a 1080p TelePresence CTS-3000 system is 15Mbps, which is a significant flow, but doesn’t burst to use all available bandwidth. The point is that since TelePresence doesn’t burst, its traffic can be classified into the low-latency queue (LLQ) without impacting voice, as long as there is sufficient bandwidth for concurrent transmissions of both traffic types.
When designing QoS for TelePresence and voice, it is important to make sure that there is sufficient bandwidth for all the potential concurrent sessions. If the combination of video, voice, and data congests the link regularly, then more link bandwidth is needed. QoS is not a replacement for bandwidth. Using QoS in these situations hurts the data applications when congestion exists. There is one possible exception to this scenario – if some of the data applications are streaming entertainment, such as Pandora.com, then allocating it to the “less than best-effort queue” and might provide enough relief that the other data applications can run at acceptable performance levels during congestion periods. Use the NMS to generate alerts when links experience congestion for longer than a few seconds. The figure below is an example of a link that is congested for most of the day.
Another factor to consider is how much bandwidth should be reserved for data. Cisco’s recommendation is to allocate a maximum of 33% of link bandwidth to the LLQ. If this threshold is exceeded the data applications suffer. The impact on data depends on the application and how much data needs to transit a link that’s also handling TelePresence and voice. The Cisco document referenced above explains this scenario. Running TelePresence over the link shown below would require QoS, because the link is running above 66% during most of the working day.
How do we measure the traffic volume in each traffic class? Well, the Cisco equipment doesn’t have a good set of show commands to display the traffic volumes. There is a workaround, however, which James Ventre documented in his blog “http://networking.ventrefamily.com/2010/09/6500-dscp-trust.html.” The trick is to define policies on an adjacent device and use the show commands there to report on ingress traffic.
The summary of all of this is that TelePresence can run in the priority (LLQ) queue without hurting voice. As always, there are several considerations to take into account. The tools to monitor QoS within the network devices are still very crude, making it difficult to track what is happening. Using the reports from phones and the TelePresence systems probably provide as good a view as any into how well the network supports their traffic flows. As an added note, some vendors say that with wire-speed devices, there is no need for QoS. The above description is about a network device where the forwarding fabric is not the limiting factor and that the egress interface is where the congestion occurs. Think about a T3 link (45Mbps) that must carry a TelePresence session. Then think about carrying two Telepresence Sessions on it. The Cisco documentation contains a good example of this scenario. Then, there are people who are doing TelePresence within a campus and they argue that they have 1G and 10G links everywhere, so there’s no congestion. I point back at the instantaneous buffer congestion documentation to learn more about it. It would only take eleven 1G links bursting at the same time to overwhelm a 10G uplink, causing congestion there. Applying QoS consistently across the network will make it easier to manage, because there’s no need to think about where QoS is implemented and where it isn’t.
Re-posted with Permission
NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html