I have some observations about network and security staffing and especially security products, including what I think is happening in some or many sites. Having said that, I make no claims of having a large data sample or comprehensive sampling of what’s going on in the industry.
This blog started as a rant of sorts, then became a blog that was far too long. It has taken a couple of re-writes to cut it down and get it better focused. I hope.
So here goes, and I’m sure those who disagree or see things differently will let me know.
TL;DR: Staffing (network, security) has gotten hard. New products and more products exacerbate the challenges around staffing. While automation may be part of the answer, reducing and/or shifting workload can be another part.
The Staffing Dilemma
Many companies are struggling with the following:
- There is not enough technical staff to get the job done.
- Properly skilled networking and especially security staff are hard to find and retain, not to mention costly.
- Their responsibilities keep getting broader, conspicuously so in the security space.
- Change windows are getting harder to come by, which adds stress for network and security engineers and eats into their evenings and weekends. Change prep documentation is consuming increasing amounts of scarce time to have fewer failed changes and/or buggy changes with rollbacks.
- Networking and security budgets are not getting bigger (but probably should be).
- In this COVID era, some people are quitting to improve their quality of life. This may be a real factor for network and security staff. (Hours, non 8-5 hours, stress, outages, compensation, etc.)
Symptoms of the Problem
How can you tell if the above applies to your organization?
Here is a set of some of the symptoms of being understaffed:
- Changes and new deployments are not getting documented and diagrammed
- The work backlog keeps getting longer
- Planning, design, and deployment or changes are done hastily, resulting in accumulated technical debt
- Outages are increasing due to the haste of deployment and/or technical debt.
- MTTR is higher due to lack of current documentation/diagrams
- The staff has no time to improve their skills
- There is a network lab, but nobody uses it
- Technical debt keeps increasing (because nobody has the time, planning, and stress budget to go fix things)
- The network is breaking more and more often (could also be due to complexity)
- The staff has insufficient skills or experience
- Staff is stressed and crabby, or staff retention is poor
Basically, there are two ways to address this:
- Add staff
- Reduce/shift tasks to free up staff time. (Automation, offloading to less-skilled staff, reducing busy-work or making it more efficient, etc.)
Staffing solutions may be part of the mix. How many sites really need a full-time CCIE? Or full-time product X expert (for some values of X)?
- Pay for part-time expertise (consultants doing staffing on a regular basis, e.g., two days/week) – NetCraftsmen is doing this for more and more firms, especially for less “core” skills. Note that this might also be done as “knowledge transfer” – experienced NetCraftsmen-person showing junior how to do things (on the job training, perhaps complementing formal training), gradually reducing hours or using them for harder tasks or other purposes.
- Use outsourced management (manage what you built or build and manage it).
Regarding “core” skills, I think what I’m seeing is that some network and security teams are happiest handing off firewall, security tools, ISE, StealthWatch, and similar work, be it ongoing deployment, management, replacement, clean up, managing rulesets, documentation, etc. That could be “blame outsourcing”? Or prioritization.
The other answer is to do time-consuming tasks more efficiently, which likely means vendor-provided automation. Vendors have been starting to step up to this:
- Automated deployment and compliance monitoring (e.g., config changes)
- Automated patching
- Automated upgrades
- Automated/assisted troubleshooting
Choosing a commercial tool versus home-grown automation is another choice. If you’re understaffed, purchasing a tool that comes with a degree of automation or one that provides a framework minimizing the coding staff might need to do might be a good idea. The last thing you need is your scarce staff spending time coding automation, either new or fixing bugs.
The situation may gradually get better.
It appears that COVID and shifting plans as the situation evolves have paused or delayed (“frozen”) some networking and security initiatives.
That started with WFH (Work From Home) and adding remote access capacity and has perhaps evolved into SD-WAN deployment and skills-building on the networking side. The security side had (and still has) the challenge of keeping up with that and growing cloud footprints.
Once those are sufficiently dealt with, now may be the time to go do cleanup, documentation, catch your breath. If the supply chain delays new equipment arrival, that’s an opportunity!
Having said that, I suspect some sites have been delaying (or too busy for) grappling with port security: 802.1x authentication and MAB (MAC authentication bypass) for devices.
That technology may be seen as low-risk or deferrable for some reason. (Really, “no intruder could jack into one of your ports”?) I suspect that may be about to change, in part due to new security emphases such as Zero Trust.
If that applies to you, you will likely find your staff getting stretched even thinner.
Oh, and if you’ve been doing “best of breed,” have fun integrating tools and hardware. Yes, I’m very impressed with what Cisco can do with its switching products plus ISE.
Other vendors have some or much of that capability. Their documentation that I’ve seen so far has been thin on details, i.e., made a somewhat negative impression on me. But maybe I haven’t stumbled on the right links yet.
Those technologies (802.1x and MAB) are one entry point to building solid ISE, ClearPass, or other comparable tool skills. Public key certificate skills are another.
But with office staff doing WFH, the physical office switch ports are suddenly less important.
Also, note that unless BYOD, MDM, and remote access all feed off the same controlling software, staff will end up supporting multiple tools for similar tasks.
In the Cisco space, DNA Center (“DNAC”) potentially provides automation (and is the apparent successor to Prime). Licensing or cost may be a barrier for some, perhaps tied to equipment refresh cycles. DNAC offers some value and automation for older equipment, which may not be sufficiently appreciated. Even if you are NOT doing SD-Access, DNAC can be quite useful. It also does not require “heavy ISE skills” for legacy designs and maybe at most “middling ISE skills” for SD-Access.
I’ve heard praise for the automation of wireless via Cisco Prime over the last couple of years. DNAC automates campus switch upgrades, apparently fairly reliably. Did that get your attention? (I’m thinking of all the labor we provided our customer with over 160 campus switches to upgrade for security reasons.) Would you like to be able to upgrade your switches annually or more often? In a small number of change windows?
There’s a hidden cost to not doing upgrades. Switches that have not been upgraded for a year or more may well have security vulnerabilities that are well-known to hackers.
A similar situation prevails with firewalls and security equipment, signatures, etc. Upgrade reasonably frequently or be more vulnerable. But does the staff have the time? If you manage network or security, are you aware of what is not getting done and how critical it might be? Do you have growing numbers of “permit any any” rules due to staff not having time to “do it right”?
From the vendor’s perspective, they can’t sell more products to customers that cannot assimilate the new equipment or software. Thus they focus on “compelling” add-on products.
This may also be why vendors marketing now focuses on ease of use and automation, especially for campus and data centers currently. However, that adds another skill set and requires staff training. (Cynically, the part the vendor might not talk much about when trying to close a deal?) Hopefully, a net win after the learning curve.
BFOTO (Blinding Flash of the Obvious): Vendors exist to sell stuff.
Is the ease of upgrades and the time an upgrade takes ever part of the evaluation criteria? I highly doubt it. If I’m right, vendors doubt it. So we end up with devices that are hard to upgrade and/or take a long time to upgrade. (Cisco Nexus 7K switches?) Prime ground for **reliable** automation.
We probably shouldn’t expect automation for free. It took time and money to develop. The vendor has to recoup that somehow. “Customer loyalty” is probably insufficient grounds for the amount of funding required.
And by the way, what’s with some server-based (security) products that take hours to install or upgrade?
Good news: in some cases, vendors are starting to recognize that customers want complete integrated solutions that address customer needs rather than just pieces of the problem.
It may be a sign of aging, but it sure seems like products have been buggier lately. This has even been true of security products, especially firewalls. Yes, Cisco firewalls are in a bit of transition (ASA to FTD), so some level of bugs might be expected there. Palo Alto has had some issues; lately, it’s not clear to me why.
Across vendors, support staff, and even GOOD support staff, seems to have been thinning out over the last several months to a year.
Not quite a bug, but security products that depend on NetFlow/sFlow/IPFIX data should probably be capable of automating the deployment of that across routers, switches, etc. Otherwise, doing so can get very labor-intensive. Especially if support requires upgrading, e.g., some “untouchable” data center Cisco Nexus 7700 switches with F2 modules. (Recent pain point.)
Yes, vendors are trying to add security and other features quickly, with thin staffing plus COVID and distractions. Still, it adds customer pain.
I’ve written (ranted?) enough previously about good modular design and simplicity. So have others.
Getting outside advice can help you find fresh and more efficient ways of doing things. An outsider may also spot good ways to simplify what you have. (Yes, NetCraftsmen does that.)
Get it right, and a good design will grow with you. A bad design will be hard to add functionality to and get more complex and fragile over time. “Technical debt accrues additional amounts of interest due” – think of it like a loan you’re not making payments on.
More Products, YASP
For the security space, threats and products are multiplying rapidly. So much so that I’m wondering if we need a new acronym: YASP (Yet Another Security Product).
Vendors refer to them as “solutions.” I sometimes cynically think we should refer to them as “partial solutions,” aka “products.” Not really joking here.
Security products seem to be mostly additive at present. Sites keep adding more, never getting rid of old ones. Yet each product comes with an incremental installation and management time cost, which adds up.
When sites do this with network management products, some sit there unused, with no great harm other than wasted money.
When sites do this with security products, unused products could be a problem, as in posting alarms that nobody is reading or acting on.
The answer here might be to look at risks, prioritize them, and identify the products used to address them. And get rid of the other products. Subject to management approval. I see this as a products/security risks covered versus budget priorities discussion. At some point, it may be useful to choose to do certain things fairly well, which may well mean consciously NOT doing other things. What are your priorities?
Sustainable Networking and Security
Cisco recently posted a good blog about the need for a follow-on to FISMA. It looks to me like a checklist of things your team might not currently be doing. Might be a good idea to take a look at it!
There is a growing need for strong ID and context-awareness (corporate device versus BYOD including cell phone or tablet, plus location context) as part of security access control. Also, consider the need to auto-segment IOT. That suggests to me that skills and experience are soon going to be needed there. Followed by deeper authentication, authorization, profiling, and posture configuration and enforcement.
As I hinted above, this is probably the most common skills weak spot in organizations right now. At best, there may be only one person with skills and experience on ISE, ClearPass, and/or working with PKI. Many sites are at the pre-NAC / pre-802.1x stage still. That’s not good!
Malware has contributed to the coming workload. As the old Pogo comic strip said, “we have met the enemy, and he is us.” (I hate to admit I’m old enough to have actually read some Pogo books when young. 1950s.) The point is you can guard the perimeter all you want with firewalls (etc.). You can segment, but the device spreading malware may be inside those boundaries.
There is also the scary statistic that current tools detect at best around 30% of malware attacks.
This leads us to behavior analysis tools, my recent Juniper blog, etc. Cisco and other vendors have their competing offerings.
All that is why Zero Trust Architecture is the up and coming in security principles. It likely requires greater identity and device / context-awareness to enable better securing of data. It also means shifting from a perimeter/border firewalls focus to a broader user/application focus. That’s just the technical side. There are also extensive governance aspects.
Network and security managers are going to have to deal with staff headcount and skills gaps. Prioritizing will only go part-way and may have already been pushed as far as it can go (and a bit further, if you start looking at what isn’t getting done).
Automation and better tools are hopefully part of the solution. The usability of such tools will be a key factor—quick acquisition of skills another. Outsourcing/partnering to bring in skills, experience, and additional staffing may be part of the solution.
The changing security landscape due to ZTA may cause a tools transition, and will cause a change in security staff assignments and deliverables.