Doing IP Routing Right

A recent blog was titled Doing IP Addressing Right.

This blog does something similar for IP Routing.

TL;DR: What are some goals for IP routing, common approaches, and how do you “clean up” a routing design?

Note that much of what I have written below uses “IP” generically, i.e., it applies equally to IPv4 and IPv6.

What Are Some Signs Your Routing is Wrong

The first thing that comes to mind is too many routing protocols.

Unless your organization is large or there are other <reasons>, you should only be running BGP to your ISPs or your cloud providers – external entities. Maybe CoLos.

Sites and internal routing should ideally use only one protocol. If you have a mix of, say RIP, OSPF, and/or EIGRP, then you likely have poor design or vendor choices.

What’s wrong with 2+ routing protocols? You end up redistributing routes (Cisco’s term). Which is a recipe for surprises and headaches.

When I first got my CCIE, I felt empowered to redistribute. Over the years, I kept encountering odd behaviors when I wasn’t very careful with redistribution. Most recently, I am only willing to redistribute with route filters in each direction and taking great care about failure modes. I much prefer one-way redistribution, using default for outbound traffic. In short, I try to avoid redistribution wherever possible, and when necessary, keep it under tight explicit control. Not that redistribution is bad per se – it is complex and locking things down reduces human error and the ways unanticipated future changes might cause problems.

Experience indicates this is a common problem, perhaps coupled with CCIE ego too. Redistribution is complex. Simple design is better whenever possible.

What can be even worse is two protocols and redistribution running multiple protocols on the same links, which I’ve seen a couple of times.

Routing should be done within a region or site using one routing protocol, with say two connections to a region running a different protocol.

FWIW, I consider RIP to be network malpractice at this point. Just say no. I expect more from even a simple network. I feel similarly about static routing – see below.

I like EIGRP personally, but it is pretty much Cisco-only. Which leaves OSPF for multi-vendor environments, or those avoiding Cisco lock-in.

What, I don’t like OSPF? Well, the problem I see there is filtering routes. As in, you can’t without adding complexity. And redistribution between different OSPF instances can have ugly failure behaviors. Most sites that do OSPF use BGP between pockets of OSPF. OSPF does also have the internal vs. external route complexity, just one more thing to keep in mind.

With both OSPF and EIGRP, route summarization is useful for optimal performance. Use it!

Another debatable issue is with OSPF and firewalls. Firewall routing implementations have long been suspect but seem to be getting better. One alternative is to use connected routes on the firewall to peer the routers on either side of the firewall – in effect treating the firewall almost like a link.

Scaling is important but remember that just because you CAN scale a protocol doesn’t mean you SHOULD. Cisco used to have slides with graphs about routing convergence with different numbers of prefixes. Just because you can do 40,000 prefixes in BGP doesn’t make it a good idea. Or 1,000,000 prefixes, which is what a full Internet feed is closing in on.

That many prefixes may be slow to converge, never converge, or cause other problems. So, route summarization is important. And in a large network, regionalization and larger degrees of summarization can help. Another tip is to maybe accept prefixes originated by your upstream ISPs, but filter out prefixes that are 2, 3, or more (pick a number) hops “out,” and use default for those. The point being that at some point, it doesn’t matter which exit to the Internet your traffic uses, so why bog down in huge numbers of prefixes?

Simple Is Good

I do like simple.

For example, if you have a public /23, and two Internet peering points, advertising the /23 and one /24 from one, and the /23 and the other /24 from the other, can provide simple failover. (Modulo upstream ISP convergence time, which can be significant.)

It is always worth putting in design time considering whether there is a simpler way to achieve your goals. You may well save yourself some painful troubleshooting time and possibly some night-time sleep hours by doing so.

Static Routes

I consider static routes to be a Worst Practice. Occasionally they are useful to simplify a design or cut costs. Using them for small frugal sites to avoid licensing dynamic routing does save money. In bigger networks, they can just create problems (like redistribute static).

Dynamic routing lets you check peers, as a quick way of seeing if traffic is getting to the other end. That’s a plus.

Administrative distance can be helpful but adds a bit of complexity. Which can add up. I mostly haven’t touched admin distance in years. I recently tried to use it for failover in a dual firewall stack scenario (data center to local users and remote Internet links). My head still hurts, although part of the problem was preserving firewall state. It ended up being sort of static routing on steroids. And took up a lot of time considering failover modes and trying to adjust things to work. The end conclusion was that even if we got it to work, troubleshooting would be a nightmare.

Which come to think of it, is a good criterion for evaluating a routing design.

Taking Out Routing Insurance

If you have a good addressing scheme, then set up route filters making sure the only routes advertised OUT of a site are those from that site, and that the site prefix(es) are NOT learning from external peers. That is a mild bit of work to set up but does mean you won’t have traffic taking strange detours, such as WAN site A’s traffic to B detouring through C.

From somewhat of a security perspective, it can be a good idea to disable dynamic routing on links with no intended neighbor. That prevents some device inadvertently or maliciously subverting routing.

Routing peer authentication is another way to do that. Using both might help preserve CPU and make intent explicit in the configuration.

Links re Routing Best Practices

This is my LMGTFY (Let Me Google That For You) section.

Found:

https://climbtheladder.com/10-routing-best-practices/

I didn’t quickly find much else. What I do recall is the various Cisco Press routing books were quite informative. With the caveat that they told you how to redistribute, but didn’t go into any depth into all the things that can go wrong with that.

It looks like the Cisco ENARSI course covers routing in a lot of depth. That gets you the “how do I configure it” part, and maybe some good practices. (I haven’t sat this newer course, I did the old ACRC and CCIE courses and a lot of Cisco Press books.)

The Cisco Press books on OSPF, EIGRP, and BGP can be helpful.

Searching for “advanced routing best practices” did better. The following appears to be a third-party version of the Cisco ENARSI course.

https://www.howtonetwork.com/technical/protocols/advanced-ip-routing/

And beyond that perhaps means hands-on and lab time.

Conclusion

Routing is complicated under the hood. Running multiple routing protocols increases complexity. I’d say 2 protocols can be 2-4 times more complex, running 3 maybe 9 times more complex.

The key is not just knowing how to redistribute and filter but also knowing what NOT to do. I’ve tried to provide hints above.

Disclosure statement