This blog discusses simplicity of network design, Segmentation and Security. It continues the series started in Simplicity In Network Design. I had originally been thinking about going through the OSI layers as an organizational principle, but decided to hit one of the problem areas first.
Segmentation is what I’m calling the process of separating some users and servers from others. This is often done due to an isolation or security requirement. Where there are different business or government entities sharing common network equipment and links, then Layer 2 or Layer 3 segmentation may be appropriate for routing or security reasons. Sometimes the challenge is figuring out which problem you’re trying to solve, since separate routing can help solve security complexity, and sometimes security (ACLs) can solve what first appears to be a separate routing situation.
Segmentation for security may be appropriate, for example, when there is a sub-unit such as an internal fraud unit. The technique used might vary if they have scattered locations. Segmentation may also be appropriate when a sub-unit has outside partners that need to be prevented from accessing the rest of the network. Or when the sub-unit needs different routing, perhaps to a different web proxy or Internet connection. Segmentation can be highly appropriate for critical care medical applications which (due to FDA restrictions on patching or altering the base system) need extra security protection.
Segmentation can be overdone, e.g. to align with managerial hierarchy. Usually this is an attempt to take a shared network resource and somehow partition it along departmental or sub-agency lines. My question is: does segmenting the sub-unit serve some useful or necessary technical, business, or security purpose? Or is it a Layer 9 (Political Layer) or empire building / control issue? Either senior management conveys the message “we all work for one company (or agency)” or the network team has to engineer and support the management hierarchy, at extra cost.
Some key questions to ask are:
- Is there a legitimate security need?
- Is the problem more security or routing?
- How wide-spread are the “pockets” needing protection?
- What is it that makes the users or servers need more protection?
- Can we piggyback on normal routing, or is a lot of specialized routing needed?
- What is the proposed segmentation going to do to packet flows into and out of the segment components?
As far as I’m concerned, networking is about providing optimal connectivity with necessary security, not about internal managerial or fiscal boundaries. Or egos. It’s too expensive to do any of those things (if not for gear, for operations and support costs). Ultimately, decisions about when to segment a group may have to be made by senior managers who have been well briefed on alternatives, pros, cons and costs.
One common reason for segmentation is misplaced security, or a misguided security audit finding, often as a symptom of a security person with weak networking skills . I sometimes refer to this as “escorting packets across the network”. This is usually done where there is a misconception that access lists cannot prevent packets from reaching non-permissible destinations in the network. Packets do not change course or “stray”. Yes, with a web proxy that does NAT they can “hairpin” back into the network. As a concrete example, an access list can easily prevent guest/visitor traffic from reaching any internal address. So why is it that guests need GRE tunnels, MPLS, separate cabling?
A good reason for segmentation is when there are many occurrences of logically distinct entities requiring firewalling from each other. For example, a large city network interconnects the city government agencies and public schools using MPLS VPN. MPLS VPN is a good fit since one government building may contain multiple city agencies. If each city agency occupied a different building, there would be little point to MPLS VPN, because a firewall or ACL and a WAN link would suffice. Similarly if each agency were in say at most two or three buildings – although MPLS VPN might facilitate maintaining centralized firewalling than per-building firewalls. At any scale, centralized firewalling might be much more attractive, in terms of management complexity.
Enough with examples! Here are the segmentation technical alternatives that come to mind:
- VRF Lite (Multi-VRF)
- MPLS VPN
- GRE, mGRE, or IPsec “routed tunnels” to a common endpoint
- VLANs (Layer 2 segmentation)
- QinQ tunnels (Layer 2 segmentation)
- EoMPLS tunnels
- ACLs
- Firewalls
I generally try not to use VLANs and QinQ since they extend Layer 2 domains. I’ve seen enough bridging loops for a lifetime. I can see using L2 VLANs for Service Provider access networks, to keep costs down and provide L2/L3 VPN flexibility.
QinQ tends to be used in a network core to carry different “customer” networks (thinking of the core network as a service provider). What I don’t like about it is the tendency of your customer problems to become your problems. Feedback on the city/rural fiber rings built with a L3 core is along the lines of “thank goodness, having Layer 3 saved us when various customers had bad spanning tree days, it isolated the problem.”
Similarly, I frown on EoMPLS. It is attractive in some cases for Layer 2 transport over a Layer 3 core between datacenters. But if you want redundancy, and just use a pair of EoMPLS links, you start counting on Spanning Tree Protocol (STP) on the WAN to prevent loops — not a good idea. I somewhat prefer EoMPLS to directly extended VLANs (e.g. over dark fiber) between datacenters, partly due to the (untested) theory that in the event of a bridging loop / broadcast storm, the encapsulation loading on the local CPU will limit the impact on the rest of the network. I do hear that with proper defensive measures (traffic storm control, COPP, BPDU Guard, Root Guard, Bridge Assurance, etc.) EoMPLS can be rather safe. See also Augustine Traore’s blog where he measured the benefit of various STP defensive measures, Protecting Switches Against Layer 2 Loops.
I currently think it is poor design to use EoMPLS for any other purpose than interconnecting datacenters, or occasional quick fix situations within a datacenter (e.g. P2V virtualization of a server coupled with VMotion into a VMware chassis elsewhere in the datacenter). One danger is that EoMPLS is almost too easy, once you’ve turned on jumbos and basic MPLS labelling. You can easily create EoMPLS “spaghetti” overlaid on a single datacenter. Thanks, I prefer extended VLANs within that datacenter. Adding spaghetti overlays just adds to the confusion and complexity without providing any solid benefit. Even when interconnecting datacenters, we generally feel other techniques should be considered as well as EoMPLS.
I generally try to avoid GRE and IPsec if possible, because they have substantially lower performance (sometimes abysmal performance), compared to ACLs, VRF Lite, and MPLS VPN. I also did not even list Cisco AToM (Anything over MPLS) or MPLS over X over Y (e.g. MPLSoGREoLayer 3TP) technologies, since one has to design around performance. The AToM technologies have wildly varying performance and gotchas depending on hardware platform. They may also require specific line cards. And I have yet to see a public scorecard from Cisco telling us how well (or poorly) they perform in terms of throughput.
While we are citing caveats, VRF Lite and MPLS VPN will not work across most WAN media, in particular not across an MPLS VPN WAN. So if you need to separate out traffic or sub-networks with either technique, some form of Layer 3 tunnel (GRE, mGRE, DMVPN, or GETVPN) is needed to preserve segmentation on the MPLS VPN WAN. With Frame Relay and ATM, you can use different circuits / subinterfaces for each private routing instance (VRF) — but those technologies are rapidly going away, except in rural locations.
Access list rules can often be used if the segmentation need is really more of a security boundary protection situation. ACL rules scale fairly well for this purpose, particularly if the exact same ACL is used at all endpoints involved. You can even build the ACL by putting in each rule twice, swapping source and destination and ports around, so as to be able to apply it at any related location. While this is a mild pain (and can make the ACL too long), it reduces maintenance. Any time you have to write per-site ACLs is a chance to make a mistake, and causes extra maintenance effort in the long run. Within reason. Sometimes you have to do per-site because the “all-in-one” approach just gets too ugly.
ACLs usually solidly outperform GRE tunnels or encryption. In the case of a MPLS VPN WAN, running tunnels across the MPLS VPN would result in double encryption and header overhead, unless separate parallel DMVPN instances are run.
Security experts might quibble that router/switch ACLs are not stateful. Actually, the Cisco routers can do stateful firewalling. Stateful is not something I want to do on a Cisco switch.
For inbound access to servers, stateful services are pretty much irrelevant if you consider what the firewall is actually doing. If you want to filter traffic coming back out of the server, you can make a case that stateful is better — but how necessary is it? For controlling internal users going to run-of-the-mill servers? (Yes, there’s a debate lurking there, it’s hard to get security people to agree that something might be better than a firewall in some ways.)
The advantage of an ACL (non-stateful) on e.g. a Cisco 6500 or Nexus switch is wire-speed performance (unless the ACL is extreme) at a cost much lower than that of a comparable firewall. The rules can be written / maintained in object-oriented form. You can even use the Cisco Security Manager (CSM) GUI, if you prefer.
With either VRF Lite and MPLS VPN, you are overlaying a separate routing instance per VPN or VRF on top of the global routing already in place. In other words, instead of managing one copy of routing, you end up managing several. Think of it as being like taking your topology diagram, and overlaying it with several different and separate layers of routing.
Having different routing depending on what VRF a device connects to is a major consideration. If it is not done cleanly and well, it can be a mess. It can even lead to routing loops unless interconnections are done in small numbers and with adequate conFor controlling internal users going to run-of-the-mill servers?trols on routing between disparate VRFs or VPNs. You need a clean design — doing VRF Lite or MPLS VPN in ad hoc fashion can lead to a real mess (we’ve seen them!) And you really need to ask yourself: is having separate routing buying me enough that it’s worth the complexity?
Example 1: Guests. Most federal shops like to run guest traffic through a web proxy. It has to be different than the staff web proxy to prevent “hairpinning” back into the network. So the challenge is routing guest traffic to a different web proxy device. My answer is Policy-Based Routing (PBR) — follow default, but “deflect” guest packets at the last minute to the guest web proxy operating in parallel to the staff web proxy. Yes, PBR is its own special form of complexity (if static routes are ugly, PBR as conditional static routes must be super ugly). But the routing difference with PBR is very localized. Compare that to having a whole separate overlay just for Guests. Which is easier to maintain?
Example 2: Distributed App Dev environment. To me the main point concerning Dev is to isolate it from Production, especially as far as routing. In terms of security risk, ACLs seem fine. We want to allow developers selected access to Dev, and we don’t want Dev servers initiating outbound connections by accident, especially to production databases or DB front ends. Where this example gets somewhat more interesting is if a separate Internet connection is used for Dev testing. Then you need a separate default route. Luckily, most Dev environments I’ve seen to date are not very distributed. And I’d opt for trying to keep Dev to one or two locations in preference to having to add a VRF Lite or MPLS VPN overlay to get traffic to a separate Internet gateway. Or consider Internet connections at each Dev location, although quality of distributed security is always a question there. The other question that comes to mind: why exactly does Dev need a separate Internet gateway? I can think of possible answers, including pure routing isolation — but this one probably needs more specific context before it can be debated.
Distributed access lists can be messy and a nuisance to maintain, particularly when e.g. configured on every voice VLAN on every closet switch. But unless the segmented network component really requires its own separate routing, it may be simpler to stick with one routing instance.
Example 3: Internal investigations group needs to be firewalled as a matter of security policy and risk management. The group is distributed across several locations. The trade off there is firewalling each location (and sharing routing) versus VRF Lite / MPLS VPN with shared firewalls. Bear in mind that traffic might have to go from a location across the net to the firewalls then back to the very same building it originated in. If 3 sites are involved, I’d lean towards just buying firewalls. If 20 sites, then I’d look at the complexity of the VRF Lite or MPLS VPN solution and compare that to buying and maintaining lots of firewalls (or pairs of firewalls). I’d probably end up wanting to verify that the policycan’t be changed to allow ACLs — and not being surprised if the policy cannot be changed.
Summary
If you’ve got a security protection problem, consider ACLs and firewalls. If you’ve got a routing problem, consider VRF Lite and MPLS VPN. If you’ve got a traffic confidentiality / integrity problem, consider encryption. In all cases, consider whether using one of the other approaches might be simpler, and what the trade-offs might be, e.g. many firewalls versus fewer firewalls but MPLS VPN and sub-optimal traffic patterns.
Along the way, think about simplicity — what keeps the network the simplest, and the easiest to manage.
Relevant Prior Blogs
Here are some on-topic blogs by myself and other Chesapeake NetCraftsmen people:
- Design and Operation of a Medical Grade Network
- Designing a Medical Grade Network
- Using BGP with VRF-Lite for Shared Service Support
- CE Design Options When Using VRF-Lite End-to-End (Part 2)
- CE Design Options When Using VRF-Lite End-to-End
- Using VRF-Lite, EIGRP, and Static Routes
- IP Multicast in a VRF
- OTV Optimal Routing
- Working with EoMPLS Part 2
and many more … see also https://netcraftsmen.com/resources/blog/