This blog shares some thoughts about BGP Traffic Engineering (TE).
The good news: there is some good content to be found on the Internet on the subject of BGP Traffic Engineering. See the links inline or in the References section below.
The dark secret: I didn’t find much on internal WAN Traffic Engineering, possibly because there is no common use case.
Internal vs. External TE
Traffic Engineering is either in the context of either:
- An ISP trying to manage costs, traffic volume, user experience quality with your various upstream or connected providers, or a company managing traffic to / from its ISPs.
- A company network trying to manage or load balance its internal WAN connections.
There is a NANOG tutorial from 2013 that contains some good information (should I also say “timeless” or “still relevant” ?). The tutorial emphasizes the importance of measurement. That triggers the Kentik neuron in my head: Kentik can factor in BGP information and provide reports such as traffic volume by egress AS number.
The conclusions I reach from that and other readings on the subject are that there are a few things we can do to control egress traffic, fewer and less effective controls over inbound traffic, and cooperation and win-win with your upstream providers is a good thing. Also, things change, and your upstream providers may have different incentives than you.
My reaction: yes, agreed. Concerning daily monitoring, do you really have the time to be playing daily whack-a-mole with traffic flows? What level of doing so is cost- and time-effective?
On the internal side, TE comes up in a couple of contexts:
- You have dual WAN providers or transports, perhaps MPLS, perhaps not MPLS, and want to load balance or steer selected traffic.
- You have dual paths in a WAN that you built and manage.
- You have SD-WAN gear and are hoping to save money by putting some traffic over the cheaper (likely Internet) transport.
I am somewhat assuming the driver in all these cases is mostly traffic volume, although that may indirectly affect link sizing and cost. Latency might also be a driving factor.
By the way, AS Path prepending is not very effective, and the effectiveness diminishes the further (in BGP AS hops) you get from your network. I’m not wild about AS Path prepending within your own network. It’s all too easy to pick up hops or do something that messes things up. Worse, AS Path prepending is somewhat indirect, with the consequence that troubleshooting it is somewhat global: you may well have to look at multiple routers when there is as problem.
If you’re thinking prepending and the Internet, how many prepends is enough? Three prepends might be on the low side.
Inbound and Outbound TE
Outbound TE is fairly easy. Formulate a policy, probably based on prefix lists, for what traffic you want to send out each exit point. BGP weight (single router) and local-preference (two routers) might well be how you leverage those prefix lists.
If you want to do application-based TE on traditional routers, then you might need Policy-Based Routing (PBR). I like to describe PBR as TCP / UDP port-specific static routes, which are basically static routes (ugly!) but more of them (uglier!). I’m not a fan: static routes can create a twisty-little maze of unmaintainable routing tables (every routing issue means checking static routes in many places). PBR just bumps up the complexity. I’d suggest trying to find a more sustainable approach.
You can do the same sort of thing in some of the real SD-WAN devices. That doesn’t mean it’s a good idea.
By the way, if you can’t do application transport preference, aka “application aware routing”, then I don’t consider it to be an SD-WAN device, it is an automated VPN solution with “SD-WAN washing”.
If you do TE just on a per-application basis, e.g. VoIP over MPLS, batch / bulk traffic over the Internet-side tunnel, then that’s fairly manageable. If you start mixing destinations and applications, that might not be such a great idea.
Inbound TE is also easy because it’s your network (or your provider connections, and your internal traffic). What might not be so easy is preserving symmetry of traffic flows. Although for internal traffic, without firewalls and stateful devices: does symmetry matter?
Note that we’re still talking routing at every hop. The table stake here is consistent routing. That means that if your WAN router(s) send a packet out either WAN interface, the intervening routers will all forward it towards the destination and not loop it back towards the source.
Unless you work for a big organization and like working on the “bleeding edge”, segment routing (SR) is probably more than you want to take on just to achieve TE. Cost and user-friendliness of a SR controller might also be concerns.
As I see it, when you’re planning your BGP TE approach, there’s a choice to be made:
- Do you do prefix-based path selection / egress selection?
- Or do you do application-based path selection?
My rationale: if you start mixing the two, you’ll have M prefix groups and N application groups, giving you M x N choices. M or N is likely smaller than M x N, hence easier to manage (as in “preserve your remaining sanity”).
Another factor to consider is the intervening hops (routers). This is as non-issue if we’re talking MPLS or transport providers, or SD-WAN: you pick provider A or transport B and ride it all the way to the destination. If you are your own WAN provider (somewhat rare these days — but consider links between CoLo sites), suppose you have P intermediate hops. Do you want to be dealing with policy on intervening routers? That has an M x P, or N x P, or even M x N x P effect. Probably better not to go there.
One answer is to design a dual WAN network with “rails”. The following diagram illustrates a way to do this with BGP. The heavier lines indicate the “rails”.
If your network is global, you can “close the loop” and have two rings around the world.
You can do something like this with an IGP by putting a higher metric on the cross-links.
BGP and Policy
The “rails” approach lets you start out simply. You can have some sites using “rail A” and others using “rail B” to reach each of two sets of destinations.
I like applying a community identifying the site to all prefixes advertised out of a site. That enables some powerful TE for BGP, because at each source site you can use the community (and weight or local preference) to send traffic from that source to that destination out a selected interface (rail).
I’d recommend building out such a network with site communities, even if you don’t intend to use them initially. Communities can be handy to have for route filtering if you need it.
I’d personally want to try to do it the same way across sites, i.e. use “rail A” to reach some subset of the sites, and “rail B” to reach the rest. That ought to be fairly sustainable. If finer granularity is needed, then sites with different needs can deviate from that policy.
What I really like about policy based on destination communities and “rails” is that troubleshooting should be purely local, at the source site.
I’ve recently worked with or heard of some sites doing “EBGP as an IGP”. Every module in the data center or WAN might have two routers, but they run EBGP to each other. This is great for passing prefixes along, although it tends to make BGP RIP-like (length of AS Path is a hop count).
Not so great: it creates ASN proliferation and defeats any “TE by rails”.
Bandwidth is getting rather cheap. At higher speeds (40-100 Gbps now), not so cheap, but still, sub-linear cost increases. That is, 40 Gbps is likely not 4 times the cost of 10 Gbps, etc.
Traffic Engineering to some degree may help you get more out of the dual WAN bandwidth you have. But you do have to weigh the value of your / staff’s time against the cost of more bandwidth. That suggests keeping the policy behind your Traffic Engineering relatively simple.
Those are the better items from Google, after skimming the top hits.
By the way, the above diagram was done in DRAW.IO, using their Mac app. Free, and pretty good, some occasional minor user interface issues. I seem to get good results faster in draw.io than in Visio, with a whole lot less fiddling.
Comments are welcome, both in agreement or constructive disagreement about the above. I enjoy hearing from readers and carrying on deeper discussion via comments. Thanks in advance!
Hashtags: #CiscoChampion #TechFieldDay #TheNetCraftsmenWay #BGP
Did you know that NetCraftsmen does network /datacenter / security / collaboration design / design review? Or that we have deep UC&C experts on staff, including @ucguerilla? For more information, contact us at firstname.lastname@example.org.