Who Owns Security ACLs?

Author
Peter Welcher
Architect, Operations Technical Advisor

A previous blog, Do You Know Your Flows? discussed figuring out what flows are in your network (network edge, etc.). And the time challenges that can present. It also lightly touched on what I think should be access list (“ACL”) best practices (but usually are not). And on some of the issues around maintaining ACLs, security policies, ACI policies, and the like.

This blog looks at this from the perspective of application security. And comes at flows more from the perspective of “how can I figure out what my ACL rules need to be?”

TL;DR Present some ideas on how to tackle the problems around securing apps, including identifying flows. Identify some of the snags in the hope that doing so will help address them.

Detailed application security can be a hot potato that nobody wants to own. And nobody has the time to do well. Or the “stuck-ee” lacks the skills to do it well. Etc. And who knew: I’m big on the process and detailed documentation for this!

This blog explores this theme to see what symptoms indicate various forms of this. And provides some thoughts around better app security. (I lean towards believing that too much process and especially too much ITIL kills all productivity. The trick is doing “just the right amount.”)

There are serious implications to this. It takes a lot of work to maintain a strong security posture. Processes and audit trails are part of that. The part that is still lacking in many shops.

Case Study

If you have apps/servers/VMs being retired and the IP-based ACLs don’t get updated, new apps can have strange failure modes, and have unprotected ports with possible external exposure, i.e., real security risks.

One of the NetCraftsmen staff experienced this at a very large financial organization a while ago. When an app was retired, nobody removed the relevant ACL rules. At some point, a new app would be stood up using the freed-up IP address. But the old ACL rules remained in place. That could block key ports, in which case the new app would not work. That’s good in that it indicates that rule fixing is needed. But the other possibility is allowing traffic to a port that the outside world should not be allowed to connect to. Leads to SURPRISE!

It took substantial effort and some outage pain (and consequent management buy-in) to get the situation cleaned up.

But managing the process better from then on greatly reduced the surprise troubleshooting of new app re-use of old IP and old ACL problems. The number of trouble tickets went from perhaps 45 per week down to 1 or 2 per week.

The technical investment was the cleanup labor, and the ROI was far fewer problems and time spent troubleshooting them.

Symptoms

This sort of thing recently came to a head with one customer. The server team has deployed tens of new apps and wants the network team to go ahead and open up access to the servers or VMs in question. They and management don’t understand the delay.

From the network team’s point of view, the app team hasn’t told them anything about what the expected and permitted flows are (TCP connections in either direction, UDP, or other). The network team keeps asking as part of taking the new app into production status and getting back what sounds like “just permit TCP any any, we need to get this app running.” The network staff has lots of other “priority 1” work to get done.

The disconnect seems to be that it can take a fair amount of time to learn what flows an app uses. Some flows may not occur until someone initiates a query or activity that causes the app to reach out to yet another source of data. So there needs to be teamwork and time while the network team or “flows owner” monitors traffic while someone puts the app through its paces.

So who should own the missing step(s)?

Apparently, that hasn’t really been discussed. The network team has usually explored and fixed ACL problems, so apparently became perceived as owning all such things.

If my opinion were requested (it has not been), perhaps BOTH teams should own app security. More eyeballs, less chance of missing something?

How DO you figure out what ports and flows need to be allowed? Ok, that was more or less the topic of the prior blog.

However, while you get a list of ports to allow, someone still has to do a strong sanity check of the results.

Who? The app folks apparently don’t have the knowledge to do so. They do perhaps somewhat know how the app is supposed to work and what talks to what. Sometimes the app just got installed, and the vendor or developer did not provide much or any documentation of ports. (My experience is that databases and tables get heavily documented, but network flows never.)

The network team certainly does not somehow magically know how the app operates, or maybe even what it does. They probably can’t really put the app through its paces.

So who really owns the ACL? Who should?

NetCraftsmen saw a different variant of this at a large customer a couple of years ago. The setting was moving between brands of firewalls. Part of the task entailed validating the server(s) for each app that was still in use, still needed all the permitted ports, etc. Huge task! And it took quite a while to get it done. We were able to automate and simplify the access-list configuration part.

It was more the validations steps, like “is this server still there, running app X” that was hard. Finding the owner of the app was really hard (like 1+ years). Without knowing the owner and having well-documented flow info, it is hard to formulate a post-change test plan to verify the app still works correctly.

How Do You Figure Out Ports/Flows?

So how do you determine what ports an app needs?

I’d consider documentation of what protocols and ports an app needs to be part of the required documentation handoff from the vendor. As in “you don’t get paid until you provide this,” required. Unfortunately, that probably doesn’t fly in the real world.

I’ve had the distinct non-pleasure of trying to track down which ports certain applications use for QoS purposes. Mostly in the VoIP and meeting space. The quality of the documentation varies wildly. Putting it in professional terms: “sucks” and “sucks worse.”

For some VoIP/Video/Meeting or other applications, the quality of what was documented has appeared suspect and sloppy. And no thanks at all to vendors who have a list of 150 random IP addresses of cloud servers that your client systems might need to talk to. A list that is subject to change or growth. Yes, probably not a big deal for security; it is outbound traffic. For QoS, yes, it will hit the Internet, but I can at least give the traffic QoS to the network edge.

This particular choice (150 random addresses, rather than a small number of future-proof small address ranges) makes me start thinking, “where’s my clue bat?” (and “how do I apply it to Zoom?”)

So for vendors documenting ports, well, my expectations are low. Probably true for all of us. But maybe that needs to change?

The network team could deploy a permit any to any rule for each new app (server, VM, or VMs), observe what traffic actually is present (via logging), then tighten down security. You could probably also do that on the app server-side and/or using WireShark if you had to.

Although doing so gives a malware-based lateral attack a chance to attempt access or establish control and subsequently be permitted access since it was “pre-existing hence trusted.”

A variation for firewalls, Cisco ACI, etc., might be to put in a permit any rule and then log matches. Carefully, as you probably don’t want to do that for all flows, just for a new server/app? That assumes permit logging is possible. With ACI, apparently, that historically required an EX or FX model switch. (For newer ones, I haven’t checked this – exercise for the interested reader.)

So one starting point is writing “white list” rules (permits, then deny all) and logging.

Perhaps do this in small batch form so as to avoid breaking the firewall or whatever with too much logging (CPU, storage).

Documenting the ports an app uses is also an enabler. Knowing “normal” can help in troubleshooting or investigating a possible security problem.

Process thought: Shouldn’t there be an “onboarding” process for new apps, along with documented findings, signed off on by the person responsible, with a date? In short, an audit trail, if for no other reason than who might know something about the app’s flows and how fresh the info is.

Related thought: All this takes time. As I’ve previously noted, dedicated time to do this right is rarely available. The “figure out the flows” task is somehow supposed to happen alongside resolving network problems and all the other ongoing tasks. It doesn’t happen or is done hastily. I’ll channel Robert Heinlein here: “TANSTAAFL”.

Flow Tools

There are various tools that might help with this. Almost anything that does NetFlow (or variants). The prior flows blog mentioned a few such products.

Yet another approach is with tools that work (or have as an option) using agents on servers and/or endpoints.

Two such:

  • Cisco Tetration (Cisco Secure Workload – a name my brain refuses to assimilate, too much like some other Cisco Secure W*** names).
  • Illumio.

Both products can be deployed with agent-based enforcement on servers/VMs. The agent can provide flow data. All that’s left is for staff to analyze the flows, name groups of servers, etc., to make it all more humanly manageable.

That word “All” is dangerous there. Sounds simple. But that last sentence could entail 1-2 or more person-years of labor site-wide. So if your organization bought the product but did not allocate sufficient labor time, you’ve got the same problem as in the last section.

Documentation

Part of what appears to be missing here is good per-application security documentation. Or audit trail or ownership trail. As in, who are the local owners and experts” on this app?

For starters, it would be nice to capture who came up with the ACL (flow) rules for a given application and when they did so, for what version of the app(s). Accountability! If their name is going to be recorded, then they are more likely to do a thorough job of identifying ports. And now you have someone you can go talk to when troubleshooting or updating ACLs.

As mentioned above, consider holding the vendor accountable for good documentation of ports? Perhaps document what they provided (and when), then check and document observed port usage and document THAT?

I like the idea of using comments or something to document the purpose of each ACL rule, with a date on it.

Why a date? This enables a periodic review of the rules for a given app or server. Which nobody does, but should arguably be part of ongoing tight security.

This also enables tracking of rules added when troubleshooting, as in “we just upgraded the app and needed to open up these ports to get it to work.”

As noted in the “Case Study” above, the other potential security problem is ports that used to be permitted but no longer are needed. They may do little harm as long as the application or server is not listening on such a port.

Testing might verify that. Is there a good test tool? How would you test? Who actually goes back periodically and “tightens up unused ACL rules”? If you do that, yes, a dated log entry should go into the “app (or server) security history log.”

Concerning removing an apparently unused ACL rule:

  • it may not be a (big) security risk
  • nobody notices it
  • removing the rule risks breaking the app in an obscure way.

For such reasons, ACL cleanup goes to the “never get around to it” part of the low priority tasks list. Risk of engineers creating problems with no obvious win (other than better security).

Having documentation can help facilitate annual (or whatever) ACL reviews. If they are done on a rolling basis (so many per week rather than an annual crisis), the effort might be sustainable or easier to staff. Oh, but that does require dedicated staff/hours.

The re-use of IP from the retired app case study above suggests that app lifecycle management is also needed. Document install/stand-up, test, production, and retirement tasks with checklists and sign-offs?

Yes, all this process and documentation sound like a lot of work. And nobody I’ve encountered does it. But maybe it really is needed?

SBOM

Software Bill of Materials (SBOM) has become a hot topic. And that is another form of app documentation that has been missing and which probably should be supplied by the app vendor.

SBOM is an inventory of all the open-source and other software packages used in coding an application. It needs to be recursive, i.e., cover all the software built into the software that gets added to the list.

Why? Software vulnerability tracking. e.g., log4j, recently. Do you know which of your apps use log4J? If you do, was it the result of a scramble? Would it not be better to have dependencies all documented, along with source code versions, etc.?

And yes, that’s a lot of work, which is best done by the vendor and needs to be done accurately.

Patches

Which brings up patches, firmware updates, etc. Who documents what application patches were applied and when?

Many do not. But you probably should. Having that documentation might affect proof, legal liability, or getting insurance to honor a claim.

And that needs to tie into the process: when the app got upgraded, did someone check its flows against the existing ACL rules?

Other Ideas

Some other good things to get documented:

Application critical dependencies. What apps or services are consumed? For example, does the app use or verify a certificate or license that requires Internet back to the vendor? When does the certificate expire?

(Mobile credit card processing devices apparently broke in Germany recently: built-in certificate expired. Manufacturer had just extended end of life and apparently thought all certificates OK until then.)

App backup plan. What data is used? How are the app and the data being backed up?

Criticality. How critical is the app? What critical dependencies does it have (e.g., a CloudFlare or other CDN load balancer front end)? User authentication?

DR plan. What is the Disaster Recovery plan? What sites are involved? (I’m already seeing sites where A=main data center, B= DR site is no longer valid. Instead, per app, the main service might be a cloud provider, and the backup might be an on-prem data center with different mixes of those depending on the app. If that’s not documented, the first time you need DR could be a CLM (career-limiting move, aka resume-revision event). Or a CF (cluster event).

Reality Check

I’m not holding my breath regarding documentation. I do think a documentation trail would be highly desirable in terms of audit trail around app security, network ACLs, etc. I’ve provided some ideas above about what should/could be documented.

What level of documentation is right for your organization?

Note that if you are THAT overly-busy network, app, or security person, it is probably a good idea to get management to clearly document who is responsible for what aspects of all this. And document in writing if you don’t have enough hours in the day to do what’s been assigned to you. Yeah, blame management or CYA?

I recognize the current reality in many organizations is perhaps good effort but little documentation. Is that perhaps because nobody owns (is responsible for) maintaining such records? I suspect it comes down to time.

Time

Yes, time.

All the above comes down to two things. For each of the tasks mentioned above, someone (or some group) needs to own it. And they need to have time to do it in short processes and allocated time.

This is a pay me now or pay me later type of thing. I see networks going undocumented. The lack of documentation may not be felt much on a day-to-day basis but can be costly (dollars, time) if you have to go do it when troubleshooting (time = delay in problem resolution).

If you don’t know the desired state of a system, then it’s hard to check that the system is in that state!

NetCraftsmen does network assessments. It’s always interesting to see what shows up in the discovery phase when we ask for network diagrams and documentation. Usually at best years old and starts getting marked up with changes. Sometimes nothing but working hand sketches. Not good. A sign of tight on time and/or priorities.

This applies to security as well as networking.

Various tools can detect changes in state (Suzieq, IP Fabric, others). Configurations and show output. That’s not quite the same as the desired state, but it does detect differences from the “seems to be working ok” state. It takes a human has to determine if that state is, in fact, the desired state.

One solution is to use consultants to address security processes and documentation. NetCraftsmen worked with a large organization to use an ad hoc web form to collect app ownership names and contact info and likely other info across something like 1000 apps. (I wasn’t part of that.) I gather that helped some but required a lot of follow-ups.

NetCraftsmen could certainly provide staff to work through (and automate!) converting large-scale flow information to ACL rules (ACI contracts, etc.).

Conclusion

It’s not near Halloween, but I’ll close with a scary thought. Securing micro-services. All the above applies, but to 10x, 100x, or some large multiple more entities to secure.

Ignoring or treating security lightly or without documentation and processes does not make problems go away. Security needs to be handled a lot more carefully than it presently is being handled in many organizations. And staffing/available staff hours is part of that problem.

The good news re micro-services is that using containers and Kubernetes forces developers to think about and intrinsically document exposed ports, to some degree. And googling shows that there are SBOM tools for containers. The way containers are built, to some extent, does that.

Disclosure statement