SolarWinds at Network Field Day 5

Author
Peter Welcher
Architect, Operations Technical Advisor

I’ve been mulling over the SolarWinds presentation at Network Field Day 5. I should probably start by noting that SolarWinds gave the attendees a nifty messenger bag with Network Field Day 5 embroidered on it (thanks). I have somewhat of a love/irritation relationship with the SolarWinds products. So I was mulling over was the old principle of “it you can’t be positive, best to say nothing at all” and how to apply it. I’ve ended up deciding to list my perceptions, and invite comments as to whether readers agree or disagree. The hope here is that anything negative provides useful feedback to SolarWinds that might result in an improved product. Or in me learning that I’m wrong (wait, that never happens).

My starting point here is that my consulting customers love the SolarWinds Orion and related products.

Confession: I rarely get to install the SolarWinds product(s), do discovery, help with reports, troubleshoot, etc. Consulting customers can do it for themselves! That’s pretty remarkable, when you think about it!

I used to regularly deal with CiscoWorks over the years, up to around version 3.0, when my customer base pretty much gave up on CiscoWorks. Compared to the customer experience there, the SolarWinds customer experience is quite a contrast! I do miss the occasional network management consulting… It’s nice to be needed, but as a sensitive consultant I have to recognize what’s good for the customer. I’ve done SolarWinds work a couple of times at sites with staffing anorexia (too few people), but other than that, self-service seems to be the name of the game. That’s fine with me.

Some thoughts about SolarWinds follow…

First:, the SolarWinds products are easy to buy. Reasonably clear website, download, try, pricing / quote online, etc. And no sticker shock. The product family can be seductive, as in you get SolarWinds in the door and you soon find you’ve bought several modules and the collective price is accumulating. On the other hand, we’re not talking hundreds of thousands or millions of dollars here.

This aligns with a principle Terry Slattery and I agree on (I think): most network management products are vastly overpriced. And end up not doing all that much more for you, at least not without a large and ongoing consulting expense. I keep wondering if the price of net management products is like silicon chips: the product of projected sales volume times price is roughly constant. If you price it lower, your sales go up, and vice versa. If you price it high (and go to costly direct sales), your volume goes way down. I’ll also repeat what I’ve noted: consultants who make a living off network management consulting know and love the labor-intensive high priced products. That may just be what they consider a good solution and are familiar with. The cynical point of view is that the pricey software customization work represents multiple years of income and employment for them. If that provides value to the customer, great! I personally think most sites want solutions, not tool kits and lengthy projects.

Second: the SolarWinds products work. How many of us have spent hours futzing with products (CiscoWorks LMS, HP OpenView, others), spending up to a half day trying to figure out why the product could not manage a particular device when ping and SNMP were working, etc. Stubborn cases of the product exhibiting “I see it but I’m not totally happy with it” behavior. Amazing how often vendors write code that has silly dependencies on perfect information or doesn’t allow a manual over-ride for unexpected conditions.

Third, the SolarWinds products are easy to maintain. People seem to be able to keep SolarWinds products working satisfactorily. When there’s a problem, it’s fairly easy to fix, and the support tools are helpful (or so I have found a couple of times, and hear at other sites). It strikes me that a lot of network management vendors don’t understand reduced staffing levels. Who can afford a full-time network management person? It’s a side-job, and at most sites, it’s lucky to be a 10%-time job. It’s also low priority compared to new deploy and break/fix work. We’re there to run networks, not the tools.

The same principle applies to other technologies. Sometimes I think being on top of 6500 or Nexus is a full-time or perhaps half-time job, yet in the real world, that’s maybe 10% of what many people have to deal with on a day-to-day job. I really like the Cisco Nexus platforms. Having said that, it seems that some of my value as a consultant is keeping track of all the hardware and code dependencies and gotchas and helping people with them. Sometimes before-hand, sometimes unfortunately after they’ve run into a snag. Cisco, speed of new features and cost, versus consistency and user-side simplicity?

The Achilles heel of most network management (“NM”) products is the database. People treat high end NM products with kid gloves. Products like CiscoWorks (or SolarWinds) used to sometimes be on server(s) under a desk, not on UPS, etc. Corrupt the CiscoWorks LMS database and it would be slow or inoperable. Other products, SolarWinds? I don’t have the hands-on track record there. Using Microsoft SQL server does mean the server folks can help you out with backup etc.? Lesson learned over the years: do treat your NM servers gently: shut them down cleanly, don’t just cut off the power, etc. Or else don’t be surprised if the net management behaves weirdly.

If you do a backup immediately after install, and another after your first fairly complete device discovery, restoring to those checkpoints may save you a bunch of time!

I’d like to insert one other observation about costly NM products. What I keep running into is that a site has spent $1-2 M on a product (often from CA, picked by management). Everyone hates the tool. It doesn’t work that well. It’s hard to learn. You’re not allowed to try to “fix” it. And there’s a 1/2 to 1 to 2 full time person(s) fiddling with it onsite, often for years. The biggest problem is that the high cost means the tool only covers a fraction of the datacenter (and often not the campus / LAN / MAN). It also supports the “most important” applications, often the big iron — which might not even begin to address the biggest headaches for staff.

A Network Management Manifesto

My personal feeling is that the priority MUST be managing every active port and device in the data center. And server/VM CPU/NIC utilization/disk/swap data.

I’ve seen too many “cluster events” where many senior / expensive people are tied up in meetings to solve an application performance or other problem. In several of them, the problem would have been detected as a simple cabling, or link or server capacity or error issue, with comprehensive monitoring. Fancy root cause analysis etc. make a lot of sense, at least if they can possibly be done cost-effectively, which I have my doubts about: too much human input / coding needed?. But they miss the point.

Fancy analysis is moot if we can SEE and FIX the basic stuff. A lot of  the overlay and SDN talk seems to assume a perfect (or well-managed?) infrastructure. I rarely see that kind of correct operation today.

News flash for SDN’ers: running the shiny new stuff over a crumbling infrastructure is not going to work well. I sometimes have the feeling server and programming people aren’t aware of all the things network people experience and why we do things in certain ways. Not so much the stuff where we’re set in our ways, it’s more the notion that virtualization means you can ignore the physical. Recently I heard of a situation where the question was whether an application might be having problems on a 10 Gbps port due to hypervisor dvSwitch oversubscription or internal HP blade chassis switch oversubscription. Or maybe the network. NM tool visibility would help with things like that!

Note that the network now crosses over into the server or blade server chassis. Accelerated virtualization will increase that factor. The server folks may need our networking skills to understand this!

Heck, when we take e.g. a NetMRI into a new site for a “network assessment”, we often find several hundred to thousands of duplex and bad cable error problems. I think I’ve never seen a site with fewer than 200 duplex issues. Nobody’s been aware of them due to lack of tool visibility. And sometimes it’s interesting getting them fixed, as its not part of the operational / trouble ticket process. The usual process often seems to require or prioritize human complaints (re-active repairs) rather than pro-active repairs.

New Things SolarWinds Talked about at NFD5

You can view the SolarWinds NFD5 presentation here. A summary follows:

  • SolarWinds provided an update on their products. They talked about the SWIS, which is their API.
  • They then reviewed the product line. You can read about the SolarWinds product line here. Better and more graphically than I can replicate it.
  • I’ll note in passing that product line breadth is useful. Managing both the network, servers, and VMs in one suite is powerful and important for smaller shops.
  • The presentation then got onto FREE PRODUCTS. (The link takes you to free trial and free tool downloads, the long list of free tools starts about halfway down the page.)

Now that I have your attention …

Where I Have Reservations about SolarWinds Products

Here are some of the minor things I’ve noted over the years:

  • It feels like almost every time I’m onsite at a SolarWinds site, I run into the problem of needing to diagnose or look at an interface, and find that it isn’t known by SolarWinds. That used to be because people needed to run discovery manually. I’m told (I asked) discovery can now be run automatically. Can the results be rendered active automatically, or is human interaction required? (Don’t know.) Is there some reason people do NOT do that? Capacity / performance of server maxed out? DB full?
  • The GUI is cluttered — the same thing in many places. Some simplification would really help. It could be worse. I much prefer menu hierarchy to some web based GUIs I’ve seen that use all four edges of the screen for choices. I’ll settle for device or other filters on the left side. I really dislike more filters or tabs on right or bottom — I tend just to not even see them. (Attention: InfoBlox NetMRI coders?) I’ll grant, good GUI is hard.
  • The addition of support / tools for fancy features like NetFlow and IP SLA over the last few years is great, glad to see that in the product suite. I haven’t had the opportunity to work with those tools recently. The initial versions seemed a little limited. We all need to recognize that SolarWinds’ business model is frugal. Taking on too much R&D at once isn’t feasible, any company must prioritize.

Please comment where you agree or disagree. And help SolarWinds and me out: what features do you like or hate, what would you like to see the product do, etc.?

Disclosure

The vendors for NFD 5 paid for my travel expenses and perhaps small items, so I wish to disclose that in my blogs now. The vendors in question are: Cisco, Brocade, Juniper, Plexxi, Ruckus, and SolarWinds. I’d like to think that my blogs aren’t influenced by that. Yes, the time spent in presentations and discussion gets me and the other attendees looking at and thinking about the various vendors’ products, marketing spin, and their points of view. I intend to try to remain as objective as possible in my blogs. I’ll concede that cool technology gets my attention!

Stay tuned!

Twitter: @pjwelcher

4 responses to “SolarWinds at Network Field Day 5

  1. I concur that the Solarwinds interface is very tired. As a user experience I find the design aesthetic to be cheesy and brash – I’d much rather have something that was pared down to the core experience (no dials). I’m also battling with performance problems and the ease of configuration of custom monitoring – but that mostly an SNMP problem.

    At the same time, Solarwinds is simple to deploy and run. It covers 80% of the needs with 20% effort and, compared to 6 or 7 year ago, at least we have SOMETHING that works well enough at a reasonable cost. That’s why I’m a Solarwind believer too, with the same caveats.

  2. For a different take on SolarWinds at #NFD5, see the fresh blog by Tom Hollingsworth at [url]http://networkingnerd.net/2013/03/28/solarwinds-the-right-tool-for-a-new-job/[/url]

  3. Pete,

    Thanks for the thought provoking post on my favorite topic – network management. SW certainly has some really best in class features but unfortunately they are leavened with some that leave one wishing for better.

    On pricing, one can get into six figure engagements when you start layering on all the separate products – especially the storage manager. I believe their pricing model is flawed, especially how they calculate NPM licenses based on "elements" (= the greatest among nodes, interfaces or volumes). It’s no fun explaining to a customer why what they thought was a 250 device license can’t manage anything more after they add their second core switch into the product. Yes it’s document but no it’s not universally understood and no it doesn’t make sense (to me or anyone outside of SW I’ve ever spoken to about it). Plus they want you to license NTA commensurate with NPM when you might only have a half dozen flow collectors but an unlimited NPM license.

    I second Jerold’s observation on the overly intrusive sales process. If I could get 1/4 the amount of calls from someone following up on a service request as I do sales call and e-mail follow up I’d count it a bonanza of customer service. Anyone who’s spent any amount of time trying to find answers on the SW’s Thwack user forum will see recurring unresolved problems that go back years in some cases.

    A good number of the tools are still console only despite the movement of a lot of things to the web UI. For instance all the most useful bits of NCM and Advanced Alerts and Reporting require one to RDP into the server console. Also, you REALLY need to run any SW installation of significant size with the MS SQL server on a separate host and running the "real" SQL server. The free SQL Express that comes with the product is useless for anything larger than a small proof of concept or trial installation,

    Still, if one takes the time to carefully deploy (and yes you can schedule discoveries and rediscoveries) and maintain a SW system (or Cisco Prime, or ManageEngine, or netMRI, or RANCID or Cacti or Nagios…) it can give significant dividends. Sadly I see again and again organizations not investing in the people and processes needed to make any tool succeed. They think: throw some Capex at the problem and the tool will magically fix everything that’s broken. Well that’s a guarantee for fail – especially when no one on the staff "owns" the solution – either personally or organizationally.

    More often than not I see customers (and have come into jobs at organizations) where there are a host of zombie servers shuffling about in the data center (or under desks, in the back of wiring closets, etc.) spinning disks and managing "something". Maybe someone even has the login for all of those servers. Maybe not. …or syslog hosts and trap servers setup on infrastructure pointing to servers that are long retired. All those are symptoms of a failure of the organization to truly achieve the potential a well-run management system can deliver.

  4. Thanks for sharing that with me and the readers, Marvin!

    I really like the point about putting in the time / people / process or the value gets lost. Couldn’t agree more! Throwing money (or tools) at problems doesn’t solve them.

    I like the term "zombie servers". Yes, under the desk communicates (to me) how much (little) value is associated with the function the server is there for. Not taking the time "to do it right" speaks loudly.

Leave a Reply