Where to Stick the Firewall – Part 1

Author
Peter Welcher
Architect, Operations Technical Advisor

Given the title, I hasten to state that nothing crude is meant. If you hate your firewall, let’s not go there. Just trying to get your attention. Now that I have it …

This blog is about firewall placement. More precisely, where / how do you do policy enforcement for various traffic flows, and where do you put physical or virtual firewalls? Or substitute something else?

This has come up recently for me in some design discussions. And it struck me that we all may need to think a bit differently in this topical area as technology evolves. There are a growing number of design options.

The previous blog covered how automation and easy-to-use Ops tools are driving single-vendor modules in designs, with WLAN merged with campus switching. As vendor tools start making security easier / more automated and segmented, this lock-in, ahem, this ease-of-use trend may continue.

This blog got long (even for me), so I’ve split it in half. This is the first half.

TL;DR: This blog covers some security generalities then digs into group-based and agent-based security enforcement mechanisms.

Integrating Security

Security design is shifting.

The old approach: insert firewall or other devices between A and B, deploy the policy.

The more recent approach integrates security more tightly with the campus or data center or even SD-WAN design (and automation). One reason for this is macro and/or micro-segmentation, which I’ll try not to get into until Part 2.

This security integration makes network design a bit more challenging:

  • Which security architecture do you buy into? To what degree?
  • How well does it work with your network architecture?
  • What’s the best combination of network design and security architecture?

Note that Cisco has a degree of integration between SD-Access, ACI, and SD-WAN now. So SD-Access and ACI can inter-operate as far as security groupings used for policy (“micro-segmentation”). And Cisco SD-Access and SD-WAN can inter-operate as far as VNs / VRFs / macro-segmentation.

Abstracting, the key design impact of that is having a consistent mechanism for users and devices to somehow be tied to IP to group mappings, communicated by something (Microsoft AD, Cisco pxGrid, or some other tool) to security devices. By doing so, you get away from IP address-based security policies and into group-based security.

The key point here is to decouple group membership, so you don’t manually maintain group objects as a list of subnets or IPs. That’s what I would like to talk about here.

The key to “group-based security” is automatic assignment of group ID, based on user identity and other context information, such as time-of-day, compliance of their workstation, which type of device is attempting access to the network, etc. This might be done via Cisco ISE or Aruba ClearPass, for example. At a minimum, mapping user or machine ID to a group in Microsoft AD may qualify. How well-informed this process is about user/device context might be a good topic for another blog about the various products in this space. Call it the “Product Eval Blog” (which may never happen: detailed examination of products I don’t use is hard and time consuming!)

ISE, probably ClearPass, and Juniper’s MIST WxLAN can use different pre-shared keys or other attributes (device vendor, IP range, etc.) in rules to assign users and devices to the groups. There’s a trade-off lurking there: simplicity versus having lots of options (and perhaps better automation of intent). More fodder for the mythical  “Product Eval Blog”.

Doesn’t IOT mean the world is expanding beyond MS AD, though? That’s where you might want to map MS groups to other tags on the network vendor side (and fewer tags, probably). And with IOT, automatic recognition of zillions of IOT sensors and vendors, etc., will likely be highly desirable. (I use “zillions” to stand for any big number, at the minimum, greater than 1.)

As traffic passes through the network, an enforcement points then either obtains the group from the packet or from cached IP to group mapping. (See also Cisco SGT, Security Group Tag, and Juniper MIST labels or Group Based Policy, GBP, tags.)

Why would you want to do this?

From a maintenance and operations point of view, group-based policies can be a major win. You set up the policy as to which IPs, device types, user IDs, user groups, etc., belong to which tag or label. The access policy is written in terms of groups or tags/labels, so you do not have to run around updating policy across lots of devices. Well, until you add a new label. But this approach also means that adding policy around a new label is also relatively easy.

To make that more clear, suppose you are standing up a new building on the network. You assign users and devices to group tags. There’s already policy in the rest of the network, and you do not have to go to all the firewalls or other enforcement points adding new subnets to local group definitions.

Put differently, instead of defining membership of group objects in the firewall or Policy Enforcement Point (PEP), you do that centrally, and you can do it taking into account more than just the IP address or subnet. Depending on what the vendor supports.

One wrinkle to this is what I’ve been calling the “Venn diagram problem”. In MS AD, a user or device can belong to multiple groups. For tags that get put into packet headers, a given user or device can only belong to one group. The two approaches do not play well together.

Exercise for the reader: where do Venn diagrams come into this?

What’s your vision for security across architectural components (campus, data center, etc.)?

I’m going to now quickly change topics to avoid this topic taking over more of this blog!

Enforcement Styles

There are a couple of ways in which enforcement can happen. That is, you can use different techniques to secure a network.

One is the physical or virtual firewall, somehow in the normal data traffic path.

Or not in the path: some vendors support “service chaining,” where the firewall is not physically in the data path but is virtually inserted into the path. It’s all magic (tunnels or fabric forwarding entries).

Cisco ACI can do this. Quite possibly other vendors’ equipment as well. I believe Arista switches can do this via an automated tunnel from the user/server edge switch to the switch port the firewall inside or outside port is attached to.

I don’t know what other switch vendors can do. (Hey, it takes time to dig this out, particularly with most web pages lately being light on details and some vendors’ detailed documentation requiring a login to access.)

Pro: Service chaining does not require inline security devices: it provides flexibility.

Con: This may be confusing to troubleshoot if not documented well. It may create sub-optimal flows (centralized back-haul of most traffic, or ping-ponging between devices in the service chains).

If all you’re doing is “deflecting” (tunneling or switching) some traffic to the firewall, at which point firewall routing kicks in, ok, that’s not terrible. We pretty much do that with the default route now anyway.

I worry about “service-chained spaghetti” (for lack of a better term). That is, when you service-chain to one device, then service-chain some of the output to another, etc. How do you document the set of service chains for each flow? My gut feeling is that this could be like troubleshooting spanning-tree, routing, and performance problems. As in slow. With a good tool showing the path, maybe not so bad. I’m going to change the subject now since I lack data on this.

Although there’s one metric I recently had cause to remember (worse: in a CST / R-PVST+ situation): when we used to teach the CCIE prep class, on the last day, we challenged the class with collectively or individually troubleshooting a relatively simple spanning tree problem involving 6 switches. My recollection is that across multiple classes, nobody ever solved it. Everyone gave up after about 1 hour. Week of training fatigue and Friday might have been a factor, but I’ve also seen that in-field troubleshooting. That’s why I strongly prefer mostly routed networks, with L2 at most between the access and distribution layer. Too many cross-device lookups leads to troubleshooting fatigue or just plain taking too long to solve. Even worse, if the collective state is changing.

Agent-Based Security

Another approach is exhibited by Cisco Tetration (now Cisco Secure Workload), or the company Illumio’s technology. For lack of a better term, let’s call this agent-based security.

Pro: Distributes the enforcement load. Switch fabrics that enforce ingress or egress also do so.

Con: Agents must be deployed and maintained to user workstations and servers. And how does someone secure devices lacking the agent, or where the agent cannot be added – proprietary solution, voids the support contract or warranty, etc.? Putting such devices behind a traditional firewall is probably the answer to that. Containing them in a VRF (“macro-segment”) another.

Smart NICs are a variation on this theme. Similar pros and cons (instead of deploying an agent, you have to deploy the NIC and driver software or buy server hardware with them pre-installed). And do code upgrades as bugs turn up, etc. My gut says “more stuff to manage, upgrade, deal with bugs with … good idea? What’s that buy you (trade-off for labor/hassle cost)?”

A question for any agent-based approach is how well it can provide or emulate macro- and micro-segmentation. I.e., how complex can the access lists get? Can they be group-based? If so, how are devices assigned to groups automatically?

Granted, servers/apps may be more suited to manual group assignment, perhaps with flow analysis assistance. This topic goes onto my “research further when time permits” list. I haven’t come across an answer to these questions so far.

I see the agent-based approach as having great potential. This is because central firewalls are becoming a major expense item and a bottleneck. Distributing the security enforcement workload potentially solves that. Enforcement of group policy by the access switch fabric also does that.

The big question for agent-based: what about IOT, where you probably cannot install an agent? Right now, one way to do that might be to use macro-segmentation (one or more VRFs) to force all IOT device traffic through a firewall. I suspect the ideal solution would use a VRF to isolate all IOT/OT/sensor devices, and ideally, tag-based rules might isolate each type of IOT device. That policy potentially becomes simpler if the devices only need to talk to their IOT gateway, and it is what talks to the outside world.

I should note that a tandem approach might work well, i.e., a combination of firewalls and agent-based. Perhaps, maybe. Nah, I’m dubious about that.

Generally, it seems simpler if all or most of your policy and enforcement is in one clear place. The next section looks at this.

Policy Enforcement

My SD-Access series of blogs hinted at this “where do I firewall” issue somewhat if I recall correctly. (I try not to re-read my blogs, it makes me want to edit them and cringe at typos.)

Thinking big picture flows you might want to secure fall into the following categories:

  • User to User:
    • User to user, same VN / macro-segment / VRF – micro-segments
    • User to user, different VN, or whatever
  • Server-involved
    • User to a physical server
    • User to VM
    • Phy server to phy server
    • VM to VM
    • Phy server to VM
  • And any of these to/from the Internet, of course.

Yes, that’s quite a list. Have you thought about these types of flows in your network? Do you have gaps? For each such flow, is there ONE clear place where enforcement gets done?

The Internet part is easy: that’s the edge firewall, traditionally. See the Internet Edge blog series as recently posted. User to Internet might or might not be handled via different firewalls than for servers and VM’s.

Users used to be fairly easy: put a firewall somewhere between them and the servers. Usually, any such firewall would be in the data center as a good place to intercept all server-bound traffic. But along came user segmentation and NOT SO FAST BUDDY that solution is not as a good a fit. We’ll pursue that topic in the Part 2 blog.

One solid question is scaling. SD-Access with a fusion firewall, or running all campus to data center and Internet traffic through one firewall pair (or two pairs for site HA?) is elegant, but a firewall that can handle all the user traffic aggregated across multiple 40 or 100 Gbps links is very costly.

Server to server used to be controlled by one or many pairs of firewalls. Some sites I’ve worked with used firewall pairs to create 20 or more “secure enclaves” for various purposes. They were probably represented by VRFs or sets of VLANs on the data center switches. They would then use another pair to control user access to the common firewall interconnect. That gets very painful to maintain, as any new rules usually must be configured on two or more firewall pairs.

More recently, with heavy VMware use, VMware and/or 3rd party virtual firewalls are handling flows involving VMs. Or that’s my impression. The result is that in some or most data centers, there’s been a shift in WHAT does the firewalling between servers in the form of VMs, but it is still contained within that data center module.

The data center physical firewall pair may be still present, possibly being used now for user to server/VM flows or for physical server to physical server or to VM flows. I suspect most will opt to handle all but physical to physical server flows in VMware.

At this point, we’ve pretty much covered everything but user-to-user flows and campus segmentation.

And yes, maybe Tetration (Cisco Secure Workload) and/or Illumio can do it all. The key question, which I can’t at present answer, is whether you have to create group objects that are lists of IP addresses or subnets and perhaps scaling. I’m told that some very big companies are successfully using Illumio. In September 2020, Forester recognized them as a leader in Zero Trust.

Conclusions

The design comes down to knowing your requirements, your constraints (generally $$ but also tech skills and other factors), and what the options are.

The material above has attempted to describe the options I’m aware of. If there are some I’ve missed, or if you think about some of this in a really different way, please let me know or blog about it. I’d like to think I’m still learning after all these years!

 

 

Disclosure statement