Container-Based WAN Monitoring

Author
Peter Welcher
Architect, Operations Technical Advisor

I’ve been waiting to see some good ideas for things you might run in a container on a Cisco router or switch. What might you do with that, either home-grown or as a product?

In the WAN monitoring space, ThousandEyes and Netbeez now both offer container-based probes that you can run in a Cisco IOS-XE device. Their probes will do ping, DNS, traceroute, and / or HTTP to a set of centrally defined targets, and provide alerts and graphs based on response time (and maybe other factors).

ThousandEyes has offered its container probe since July 2016.

Netbeez recently announced theirs. Early in 2019, Netbeez announced running the agent as a virtual service in ISR routers. More recently, as a container on a Catalyst 9000 switch.

Both companies’ container-based probes can run on other container platforms as well.

Why You Care

Well, maybe you don’t care about this. Here’s why you might: What the container-based probes do make possible is not having to ship and install physical probe hardware at remote sites. That’s a major win!

Beyond that, there’s the basic rationale behind such probes.

Two words: User Experience (UX). How can you tell which sites are experiencing degraded user experience? That’s the key use of such probes.

Note that UX is not something your standard network management tools are going to tell you about. They’ll tell you about technical factors that contribute to good UX. Those can help you figure out why the UX is bad. But knowing that bad UX is taking place is arguably more critical. It encompasses many factors, some of which your traditional network management tool might not be monitoring or alerting on.

As evidence that some people consider UX data valuable, there is at least one major hotel chain that has deployed physical AppNeta probes at each site (1,000 of them and growing when I last heard). AppNeta is the third company in the probe / UX space. I can’t tell from their website whether their UX monitoring tool is available in container form. It is available in virtual form.

One value to such probes is seeing which sites are affected by WAN problems. Another is site history, being able to see sites with ongoing performance problems, or being able to see and document when a problem began. That can be useful for “temporal correlation”, a fancy way of saying “checking the change logs and syslog, etc. for what else was going on at that time.”

If path trace history details are also available, that may also be helpful.

Another way such tools might add value is with the cloud, either verifying UX from internal sites to cloud apps, or from cloud apps to internal databases or whatever. They mostly only support http; and https:, but if there’s a REST API, that may suffice.

New Uses

Another use comes to mind: deploying multiple container probes per site to get finer granularity on UX. I’ve previously blogged about that as “differential measurements”. As in, things are good from the Internet edge but not from the 3rd floor. Or bypassing zScaler via GRE tunnels is faster (context: the problem appeared to be an overloaded edge tunnel router, not zScaler).

Having a probe in container form would seem to allow you to deploy a virtual probe where needed fairly quickly. By uninstalling and re-installing elsewhere, it might be possible to hold down licensing costs.

I asked Netbeez (12/2019) about support for packet capture in their virtual probes. They may add tcpdump to their base container. If you think this is potentially useful, please let me know and I’ll pass it on to Netbeez. The use case I’m thinking of is to avoid doing packet capture on your switches.

Could a container probe be useful with Kubernetes. Maybe.

For micro-services running on cloud or container platforms, maybe not so much. Reporting and logging will likely be built into such applications, tracking interactions between component services.

However, for hybrid cloud scenarios, one might want probes running between clouds or locations, especially in a setting where the components might be running in multiple or changing locations. That would enable the network team (or whomever is responsible) to respond to slow cloud-to-cloud or site conditions, on a more macro level than the services level.

Comments

Comments are welcome, both in agreement or constructive disagreement about the above. I enjoy hearing from readers and carrying on deeper discussion via comments. Thanks in advance!

—————-

Hashtags: #CiscoChampion #TechFieldDay #TheNetCraftsmenWay

Twitter: @pjwelcher

Disclosure Statement
Cisco Certified 20 Years

NetCraftsmen Services

Did you know that NetCraftsmen does network /datacenter / security / collaboration design / design review? Or that we have deep UC&C experts on staff, including @ucguerilla? For more information, contact us at info@ncm2020.ainsleystaging.com.

Leave a Reply