Internet Edge: Double Don’t Do This

This is the sixth blog in an Internet Edge series.

Links to prior blogs in the series:

Internet Edge:Simple Sites
Internet Edge:Fitting in SD-WAN
Internet Edge:Things to Not Do (Part 1)
Internet Edge: Things to Not Do (Part 2)
Internet Edge: Double Data Centers

There is one more situation I’ve run across in the real world that I would recommend not doing. That’s what this short blog is about!

And Don’t Do This

Let’s take a look at Active/Passive (“A/P”) firewall failover.

I’ve seen this sort of design done with several different brands of firewall, but my brain associates this with CheckPoint firewalls. Probably because that’s where I’ve seen in most. Based on what I heard from someone recently, their support group seems to still think this is best practice.

I might guess that’s because they don’t have an alternative, or don’t understand how L2 between sites is Bad. We used to tolerate sporadic downtime. Now it is not acceptable. But that’s a rant for another blog!

FYI, there used to be a gotcha with doing two A/P firewall pairs on a common VLAN. The default MAC address used would be the same, so it needs to be manually specified. I have no idea whether that problem still exists. (This is also true for router First Hop Routing Protocols unless you do something that causes a different MAC address to be used, e.g., a different HSRP group.)

The A/P design is a design that is arguably acceptable and works reasonably well within a single data center. The problem comes from trying to do it across two data centers, usually to save money. I’ll show the two-data center version. (Picture worth 1,000 words!)

The first problem is that only one firewall is active, so that’s a bit of a bottleneck. You might be able to get your routing to load balance to a degree (inbound and outbound) at the price of some latency. On the other hand, trying to get routing to align with the active firewall could be done, but … complexity?

The bigger problem is that doing this entails VLANs spanning between your data centers. That makes them a shared failure domain. You’ll probably have set up the inner link as a trunk, which will make it all too easy to trunk other VLANs between the data centers.

If there is a migration to a new data center, you may well end up trunking to that third data center. From experience, it is probably best not to go there! If the L2 links between any of the site cores aren’t stable, you’ll really regret doing it!

For similar reasons, I really am skeptical about the wisdom of clustering firewalls across different sites. Even if the clustering uses Layer 3 technology, doing cross-site clustering probably has interesting failure modes, as in “painful.”

IGP Routes

There’s a problem I used to see in dual-MPLS WAN networks that could become a problem here, both for Internet or for external BGP connections (cloud, business partner, etc.).

The problem arises if you redistribute your IGP into BGP on the border routers.

What happens is if one outside link goes down, BGP withdraws prefixes but then learns them via the other router’s redistribution into the IGP, then local redistribution back into BGP on the first border router, the one with the failed link or withdrawn remote prefix.

When the missing link or prefix is restored, that router has a BGP prefix with weight 32768, which is preferred over the learned outside prefix learned via EBGP. So, the routes that were learned from the WAN or Internet are in the routing table but not the best route. And your traffic continues to route via the other router.

I like to filter external EBGP to prevent surprises. You can use the same prefix-list / route-map on both border routers to control what prefixes they learn from the Internet (or WAN / other external entity, if you’re in a cloud or business partner EBGP situation).

The key to the problem I just mentioned is that you can also use the same prefix-list or route-map to filter redistribution from the IGP.

The way to think about this is that you want to only learn outside prefixes on the outside. And that in case of one WAN link being down, the IGP or IBGP can steer traffic to the other border router, assuming it is still receiving a given outside prefix.