Security Architecture Has Gotten Hard

The title of this blog article may well be nothing new in your life. Or it may be something you and your company have discovered the hard way, evolving the network and security with good intentions but ending up with a bit of a hot mess.

This blog discusses some of what I am hearing from my peers and what they’re seeing atsome customer sites. The goal is not to shame anyone (could lose customers that way) but try to identify some possibly common “lessons learned.” And provide some third-party discussion that might just be helpful in your planning.

This blog complements a prior blog:

https://netcraftsmen.com/comparing-trustsec-nac-versus-agent-based-controls/

Problem Statement

Problem #1 is the security team. (Humor intended. Security team sense of humor may not be present.)

Specifically, some larger organizations have gotten a bit … stovepiped, with network and security teams “staking out turf.” The problem there is that if the teams aren’t communicating and having real architectural discussions, your company can spend a lot of money creating a complex failure-prone hard-to-manage mess. And I mean a conspicuously large amount of money!

That’s one way you end up in a mess: sunk cost syndrome. E.g., we just spent $2M on those 10 Gbps firewalls, and now we have to buy four more pairs, or rip out the four we have and do something different?!!

Networking and security teams used to have a good division of labor. Or maybe it was “turf.” Security would monitor end systems, malware tools, etc., as well as firewall rules and perhaps the firewalls themselves. Compliance, processes, updates, security patch management, etc.

The “turf” approach means that we had network devices, and security perhaps owned the firewall. Over time, that expanded at some sites, usually data centers, to security having a bunch of devices in the outward-bound path, usually in proximity to firewalls. That was in the days when we trusted everyone on our network. (Were they the “good old days”? Maybe not?)

That created potential problems:

Firewalls and other devices with undocumented or optimistically sales-oriented numbers regarding throughput limitations. Among them, not documenting the impact of turning on all the wonderful security features someone bought the device for. As in, doing so clobbers throughput? And not providing design guidance so that customers will buy a security appliance with enough capacity for all the features they intend to enable.
Or security staff buying appliances without anticipating requirements growth, then turning on additional features anyway.
Complexity: multiple devices in the path to the outside. E.g., a device that acts as SSH man in the middle and copies selected traffic to monitoring devices, and perhaps also sends NetFlow or comparable flow data to other monitoring devices. The cabling alone can be complex. And as above, throughput needs to be properly engineered.

Real world stories:

One organization where the security team didn’t discuss planning with the network team, or track and update budgeting annually. Result: stale budgeting or whatever led to 1 Gbps monitoring tool(s) in the path in a 10 Gbps network. Consequent security pain trying to keep what the tools monitored from crushing the security devices and slowing the network down. My take: not really viable in the long run.
One organization which deployed a large SD-WAN/SASE to over 1000 sites, where the security team now has a differently branded SASE/security box already in deployment. (I didn’t hear whether the existing SD-WAN was being removed, or what.)

The second of those suggests to me “security team stuck in security box insertion mode.” No offense intended, just trying for a memorable description. Security box insertion may be the right answer. Or not.

Deploying 1000 or more inline security boxes is likely very costly and takes quite some time to do. And is it manageable in the long run?

Additional sources of complexity:

CoLos
Cloud
Zero Trust
SD-WAN or SASE

Impact:

You can no longer really force traffic through a single chokepoint security device or pair of devices. Cost, complexity. Backhauling traffic to some central security portal adds latency and is undesirable. I think (well, hope?) most networking/security people are now aware of this.
With CoLo presences, some organizations were able to shift security devices into the path to the CoLo. Regional SD-WAN architectures played well with that, although routing failover to another region and security state (symmetric flows) requires careful and complex design.
With Cloud, virtual appliances or other forms of traffic enforcement started becoming a factor. With multi-Cloud, each cloud vendors’ networking, and security approaches (and possibly DNS/IPAM, ACLs, etc.) differs. And anti-malware, anti-phishing etc. more focused on the end-user side of things. So do you insert some form of virtual security appliances to support a single-vendor solution?

I don’t have a great answer. Part of the problem seems to be the design concept that you have to force traffic through something that does security. If that’s an ownership thing, maybe not so good. If it has a security origin such as ensuring the security chokepoint is itself secure, well maybe.

Segmentation

For quite a while, my brain has been stuck on segmentation = VLANs and VRFs. That’s the classic networking approach. The challenge is building it. Automated or central management tools help with that. And VRFs scale pretty well, basically just compartmentalized routing tied to “captive” VLANs on switched networks.

On the other hand, all that does add complexity. And designing to use only devices that support segmentation. Including security devices.

Alternatives

This is where some security vendors are looking at this and designing for it.

Cisco can provide split functionality, putting some security functions in routers or switches, and others in the cloud for scaling. This bothers some people, not to mention the ABC (Anything But Cisco) security folks.

Another approach: zScaler has gone to an agent-centric approach, leveraging the Cloud for analysis, enforcement, and reporting. Distributing security functionality can alleviate performance bottlenecks and remove the need for costly high-throughput inline security devices. Distributed Cloud functionality supports low latency.

Illumio and other companies also have offerings providing functionality in that space. At least what I’ll call ACL enforcement if not more. Per-user authentication and group-based controls, and even reputation/behavioral trust tools are coming if not already there. Complexity of the single- or multi-vendor ecosystem doing this sort of thing could also become an issue.

Elisity also seems to fit in. And I’m sure there are other startups doing ZT or ZTNA. See also some of the discussion in the prior blog referenced at the top of this article.

Did I just say “endpoint-based Zero Trust”? Or maybe “agent based”? Perhaps. That space is new, evolving quickly, and not something I’ve been closely tracking.

As always, with alternative approaches to something, there are trade-offs:

Managing a growing population of physical or virtual security appliances that may also be doing SD-WAN or other networking, VERSUS managing agents on user devices, servers, VMs, containers, Kubernetes clusters, etc.
Being able to monitor anything that puts traffic on the network, versus having IOT or proprietary servers (etc.) that you cannot put a Zero Trust agent on.
Security controls only at security chokepoints, versus anything with an agent. Which may come down to controlling and segmenting local traffic versus just traffic headed for data center, cloud, etc.
Having to make sure your security “chokepoint” devices see and enforce all appropriate security measures, versus having to enforce that endpoints must have the security agent on them (and the headaches around things like personal devices and IOT devices).

In short, there are always pros and cons. What you get to do is chose which, and the magnitude of their impact on you.

Some Solutions

Here are some thoughts, attempting to finish with some constructive ideas:

Silos are not good, teams MUST work together on overall plumbing (connectivity), routing, monitoring, alerting, and reporting architectures. (Am I belaboring the obvious here? But it is well worth repeating!)
Complexity is the enemy unless you like downtime and complex lengthy troubleshooting sessions. As in days to weeks trying to pinpoint a performance problem. If you aren’t just “stumbling” into a network + security architecture, then this should be a prime metric in your evaluation of proposed solutions!!!
Shared architecture and simplicity are key. Separate network and security is no longer viable.
Joint longer-term planning, budgeting, architecture, and product evaluation between network and security teams is also key.
Monitoring, smart alerting, and the right tools are key. “We’ve got SNMP and link down alerts” is nowhere near enough. Detecting that some box is dropping say 5% of the packets going through it can really matter. And can be hard if the device doesn’t support monitoring that!

Disclosure statement

Problem Statement

Segmentation

Alternatives

Some Solutions

Related Topics