Remote Communications During A Crisis Can Be Challenging – Some Things To Think About
The SolarWinds Networking Field Day 13 presentation focused on its new NetPath tool, a supplement to its Network Performance Monitor (NPM) product.If “improved multi-path aware TCP-based traceroute on steroids” grabs your attention, read on!
SolarWinds produces relatively inexpensive network management tools. If you haven’t figured it out from my prior blogs, I’m a picky consumer of network management products, and I am accustomed to customers often failing to put in the maintenance the products need (lower priority than other tasks).I’ve recently been fairly hands-on with several network management tools, including SolarWinds Orion. I just confirmed what many people will tell you: “SolarWinds is better than many of their competitors’ far costlier tools.” Their target appears to be solid data meeting customer needs while keeping their and your costs down.With any tool, I do recommend you buy a full set of licenses and monitor all active interfaces — but that’s a rant for a different blog. SolarWinds’ pricing model (basically, licensed node = interface not device) makes that a bit more challenging than with most other network management tools (or raises its effective price a bit, depending on your point of view).
Anyway, I wasn’t sure what to expect from the #NFD13 presentation. More refinements? Well, the entire session focused on the new NetPath tool and kept the NFD13 delegates’ interest!
It turns out, I was very impressed with the new NetPath tool!
The fascinating part (for me, anyway) was the learning process that the SolarWinds engineers went through in developing the tool. That includes how they had to overcome various hurdles, among them reducing false positives and identifying the most important problems along a path, filtering out some of the “noise.”
The first problem SolarWinds had to solve was finding the right path, taking TCP and TCP port behavior into account. If you’re thinking traceroute, well that’s UDP-based, and UDP is not what most internet applications use. Media (voice, video), yes, UDP. Anyway, using TCP may help avoid any issues with policy-based routing of TCP in the network.
Traceroute and SNMP MIB-II routing table data both can be misleading. That’s because neither handles ECMP routing well — the second challenge in the path space.
For technical reasons, the SNMP MIB-II routing table can only supply one entry per prefix. Vendors that rely on that for displaying paths are displaying only one of possibly many paths — and one that might differ from the path your traffic actually takes. Traceroute sort of shows when there are two paths, but piecing together the ECMP information into paths is not necessarily straightforward. Finding many/most alternative paths when each hop has multiple next hops is not simple. So SolarWinds had to solve the problem of stitching together the TTL-exceeded and other information to learn about paths. You have to know the paths before you can figure out where the problems are along them.
The third challenge (or group of challenges) was spotting performance problems and weeding out the noise.
Along the way, the development team also had to solve displaying the data in a user-friendly way.
If that interests you, watch the #NFD13 videos for the details.
I’ve become a big fan of path-oriented products. It comes down to troubleshooting: I know something between A and B is causing poor performance (a speed or quality “brownout”); how do I rapidly find the problem(s)? I’ve seen people writing down traceroute hops in each direction, then SSHing into the routers (please tell me you’re not still using telnet), doing show commands … it can take hours just to get a single snapshot! If your tool can’t easily provide the path and the performance data, maybe you need another tool, one that’s actually useful? Ditto, historical data along path(s), so you can spot intermittent problems.
That’s what NetPath does: It lets you see paths from virtual agents or other endpoints across your network to internal servers or to cloud-based SaaS or other servers.
You can also see if the path changes (and when), if there are problems (latency, packet loss), or where device configurations have recently changed. (That last item assumes you also have the NCM configuration management component installed.)
As soon as you include the internet/SaaS, any path performance data must be based on probe packets, since SNMP data is not going to be available from ISP or WAN provider routers.
It is (or should be) well-known that ping does not provide good packet loss or latency data, since processing ping is not a priority in Cisco devices or most other devices. (Think busy person, putting off responding to an email or phone survey or salesperson calling….)
My understanding is that the probing for each NetPath path consumes a one-node license, basically for the polling.
As far as I know, SolarWinds does not cluster servers for larger networks. You instead can throw multiple servers at a large network and assign portions of your network to each. Something to bear in mind.
I also suspect SolarWinds is more focused on basic functionality, not performance/scaling optimization. That makes sense, given their customer base, also the link between licensed entities: per-node/per-interface/per-polling item licensing tends to encourage focused polling. Other products with efficient polls/database IO don’t need to limit their users’ resource consumption as much — but then don’t do useful things with the data gathered either.
I do get the impression that SolarWinds products attempt to provide good responsiveness by limiting the magnitude of what you can ask the system to do. Having said that, that does help eliminate clutter and too much information overload.
The NetPath focus seems to be at the “flow” level. To track performance for a multi-tier application, one might need various virtual agents polling the various flows. That suggests that NetPath is probably best used for front-end SaaS responsiveness monitoring. Although virtual agents in the cloud might track back-end flows where your company controls the cloud-based application.
I wonder how well path tracking would work for, say, a company with 500 “critical” apps. (App owners often get their egos wrapped up in being “important” or “mission critical.” I’ve learned to use the term “fragile” when talking QoS; guys tend not to want inclusion in the fragile class unless their app needs it.) Setting up the polling for 500 apps might be rather — extremely — time consuming. Also, one might want grouping of polling results by application or service.
As a side note, monitoring cloud-based app performance has potential challenges if the DevOps team is much heavier on the dev than the ops side — i.e., testing/monitoring performance gets skimped on, or if stove-piping hinders cross-team visibility (network team visibility into inter-server instance or inter-container flows and their performance). That’s a topic for another blog, however.
Other vendors with some of the same or similar capabilities: Riverbed, NetBrain, ThousandEyes, AppNeta, and NetBeez come to mind. The latter two are somewhat more oriented toward tracking historical end-to-end performance with some pathing data, e.g. when the path changes. The first three provide more details about the path hops.
I’ll note that Riverbed allows you to use NetFlow data to stitch together flows into “service policies.” The point being, to (a) alert you when a key app is having performance problems, and (b) allow you to drill down to the component flows and paths to see what might be the cause. That might be a future evolutionary step for the SolarWinds NetPath capability.
Comments are welcome, both in agreement or constructive disagreement about the above. I enjoy hearing from readers and carrying on deeper discussion via comments. Thanks in advance!
Remote Communications During A Crisis Can Be Challenging – Some Things To Think About
Need for Speed
Container-Based WAN Monitoring
Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.
Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.
John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services. Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.
He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.