Network Stability Through Resilience Engineering
This blog shares some thoughts about BGP Traffic Engineering (TE).
The good news: there is some good content to be found on the Internet on the subject of BGP Traffic Engineering. See the links inline or in the References section below.
The dark secret: I didn’t find much on internal WAN Traffic Engineering, possibly because there is no common use case.
Traffic Engineering is either in the context of either:
There is a NANOG tutorial from 2013 that contains some good information (should I also say “timeless” or “still relevant” ?). The tutorial emphasizes the importance of measurement. That triggers the Kentik neuron in my head: Kentik can factor in BGP information and provide reports such as traffic volume by egress AS number.
The conclusions I reach from that and other readings on the subject are that there are a few things we can do to control egress traffic, fewer and less effective controls over inbound traffic, and cooperation and win-win with your upstream providers is a good thing. Also, things change, and your upstream providers may have different incentives than you.
My reaction: yes, agreed. Concerning daily monitoring, do you really have the time to be playing daily whack-a-mole with traffic flows? What level of doing so is cost- and time-effective?
On the internal side, TE comes up in a couple of contexts:
I am somewhat assuming the driver in all these cases is mostly traffic volume, although that may indirectly affect link sizing and cost. Latency might also be a driving factor.
By the way, AS Path prepending is not very effective, and the effectiveness diminishes the further (in BGP AS hops) you get from your network. I’m not wild about AS Path prepending within your own network. It’s all too easy to pick up hops or do something that messes things up. Worse, AS Path prepending is somewhat indirect, with the consequence that troubleshooting it is somewhat global: you may well have to look at multiple routers when there is as problem.
If you’re thinking prepending and the Internet, how many prepends is enough? Three prepends might be on the low side.
Outbound TE is fairly easy. Formulate a policy, probably based on prefix lists, for what traffic you want to send out each exit point. BGP weight (single router) and local-preference (two routers) might well be how you leverage those prefix lists.
If you want to do application-based TE on traditional routers, then you might need Policy-Based Routing (PBR). I like to describe PBR as TCP / UDP port-specific static routes, which are basically static routes (ugly!) but more of them (uglier!). I’m not a fan: static routes can create a twisty-little maze of unmaintainable routing tables (every routing issue means checking static routes in many places). PBR just bumps up the complexity. I’d suggest trying to find a more sustainable approach.
You can do the same sort of thing in some of the real SD-WAN devices. That doesn’t mean it’s a good idea.
By the way, if you can’t do application transport preference, aka “application aware routing”, then I don’t consider it to be an SD-WAN device, it is an automated VPN solution with “SD-WAN washing”.
If you do TE just on a per-application basis, e.g. VoIP over MPLS, batch / bulk traffic over the Internet-side tunnel, then that’s fairly manageable. If you start mixing destinations and applications, that might not be such a great idea.
Inbound TE is also easy because it’s your network (or your provider connections, and your internal traffic). What might not be so easy is preserving symmetry of traffic flows. Although for internal traffic, without firewalls and stateful devices: does symmetry matter?
Note that we’re still talking routing at every hop. The table stake here is consistent routing. That means that if your WAN router(s) send a packet out either WAN interface, the intervening routers will all forward it towards the destination and not loop it back towards the source.
Unless you work for a big organization and like working on the “bleeding edge”, segment routing (SR) is probably more than you want to take on just to achieve TE. Cost and user-friendliness of a SR controller might also be concerns.
As I see it, when you’re planning your BGP TE approach, there’s a choice to be made:
My rationale: if you start mixing the two, you’ll have M prefix groups and N application groups, giving you M x N choices. M or N is likely smaller than M x N, hence easier to manage (as in “preserve your remaining sanity”).
Another factor to consider is the intervening hops (routers). This is as non-issue if we’re talking MPLS or transport providers, or SD-WAN: you pick provider A or transport B and ride it all the way to the destination. If you are your own WAN provider (somewhat rare these days — but consider links between CoLo sites), suppose you have P intermediate hops. Do you want to be dealing with policy on intervening routers? That has an M x P, or N x P, or even M x N x P effect. Probably better not to go there.
One answer is to design a dual WAN network with “rails”. The following diagram illustrates a way to do this with BGP. The heavier lines indicate the “rails”.
If your network is global, you can “close the loop” and have two rings around the world.
You can do something like this with an IGP by putting a higher metric on the cross-links.
The “rails” approach lets you start out simply. You can have some sites using “rail A” and others using “rail B” to reach each of two sets of destinations.
I like applying a community identifying the site to all prefixes advertised out of a site. That enables some powerful TE for BGP, because at each source site you can use the community (and weight or local preference) to send traffic from that source to that destination out a selected interface (rail).
I’d recommend building out such a network with site communities, even if you don’t intend to use them initially. Communities can be handy to have for route filtering if you need it.
I’d personally want to try to do it the same way across sites, i.e. use “rail A” to reach some subset of the sites, and “rail B” to reach the rest. That ought to be fairly sustainable. If finer granularity is needed, then sites with different needs can deviate from that policy.
What I really like about policy based on destination communities and “rails” is that troubleshooting should be purely local, at the source site.
I’ve recently worked with or heard of some sites doing “EBGP as an IGP”. Every module in the data center or WAN might have two routers, but they run EBGP to each other. This is great for passing prefixes along, although it tends to make BGP RIP-like (length of AS Path is a hop count).
Not so great: it creates ASN proliferation and defeats any “TE by rails”.
Bandwidth is getting rather cheap. At higher speeds (40-100 Gbps now), not so cheap, but still, sub-linear cost increases. That is, 40 Gbps is likely not 4 times the cost of 10 Gbps, etc.
Traffic Engineering to some degree may help you get more out of the dual WAN bandwidth you have. But you do have to weigh the value of your / staff’s time against the cost of more bandwidth. That suggests keeping the policy behind your Traffic Engineering relatively simple.
Those are the better items from Google, after skimming the top hits.
By the way, the above diagram was done in DRAW.IO, using their Mac app. Free, and pretty good, some occasional minor user interface issues. I seem to get good results faster in draw.io than in Visio, with a whole lot less fiddling.
Comments are welcome, both in agreement or constructive disagreement about the above. I enjoy hearing from readers and carrying on deeper discussion via comments. Thanks in advance!
Hashtags: #CiscoChampion #TechFieldDay #TheNetCraftsmenWay #BGP
Did you know that NetCraftsmen does network /datacenter / security / collaboration design / design review? Or that we have deep UC&C experts on staff, including @ucguerilla? For more information, contact us at email@example.com.
Network Stability Through Resilience Engineering
Cloud Security 101
BGP Traffic Engineering
Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.
Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.
John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services. Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.
He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.