Network Configuration Management – Know when it is wrong

Author
Terry Slattery
Principal Architect

I recently read a blog by Joel Spolsky titled Making Wrong Code Look Wrong, in which he described ways that software developers can help make it easier to spot errors and to reduce the potential for bugs.  Since I do some development, it was an interesting read.  One of the major points was that when you have to modify code, you shouldn’t have to examine the code in a lot of other places to make sure that you’re not creating a bug with your modification.

But then an interesting thing happened that tied Joel’s article into network design.  I learned of a network that had a major outage that lasted 30 minutes.  Most of the network was down.  In the post-mortem analysis, it really wasn’t down.  It had a default route injected that created a routing black-hole.  Let’s see, 30 minutes makes the network availability 99.9942% for the year.

A simple configuration change was the culprit.  There were no routes in the configuration change, so what happened?  How could the configuration change cause such a massive network failure?

The configuration change extended an MPLS VPN into a new part of the network.  The proposed change looked benign.  It created a new VLAN and tied the SVI for that VLAN to the MPLS VPN via BGP, extending it to the new location.  The problem was that the MPLS VPN contained a default route that wasn’t apparent by inspection of the proposed change.  When the VPN was extended, the newly injected default route was preferred over the default route that was in the core of the network.  Instant black hole.

Having just read Joel’s article, I noted that the proposed configuration change would have required that the network engineer and any reviewer to carefully examine the routes carried in the extended VPN to make sure that there would be no problems.  Similarly, the routes in the core network would have to be examined to make sure that they wouldn’t cause a problem in the VPN that was being extended.  Both of these actions violate Joe’s premise that you shouldn’t have to look very far from the source of the change to determine if the change is safe to make.  Having to do a lot of work to validate a change will guarantee that it won’t be done very often.

Back to the network configuration.

I’ve always been a proponent of very limited use of static default routes, and static routes in general.  Default routes should be originated at the Internet borders.  The only exception might be where your network is large enough to be segmented into several major routing domains.  Originating default into each domain from the junction with other domains would be appropriate.    The key is that there only a few routers should originate default routes.  And those defaults should be tied to outgoing interfaces, so that if the interface dies, the static default route is withdrawn.
A well-designed routing system will propagate the default routes to the rest of the network.   It makes troubleshooting much simpler.  If Internet connectivity is lost, you don’t have to wonder where the traffic will flow.  It will die as soon as it reaches a router that doesn’t have the default.  Go track down where the dynamic routing is failing and you’ve fixed the problem.  It’s nicely deterministic.

But how do you determine which routers are originating default routes?

NetMRI retrieves the routing tables of routers (as long as the tables are less than 3000 routes — if you have more routes than that, you should consider breaking the network into multiple private autonomous systems).  The Network Explorer/Summaries tab (see image below) lists the routes in the network and the routers that are originating each route.  It excludes the routers that are simply forwarding the routes.  Because NetMRI is obtaining the routes directly from the routers, it is able to report on default routes within summarized parts of the network.

The example network shown below has hundreds of routers originating the default route.  This is because each edge router is configured with a static default that points to its upstream neighbor.  Instead of a static configuration, the default route should be learned from the upstream router.  Propagating the default via the routing protocol also makes the device configurations more consistent in that they don’t need a device-specific default route in the configuration.  A template for dynamic routing would be the same across hundreds of devices and simplify the configuration management, allowing a template to be used to verify proper routing configuration on most of the routers.

DefaultRoutes

Of course, when you use a default route, you’ll need to configure classless routing so that the default will get used.  And make sure that you have a summary route to Null0 for the address space you use in your network so that when an internal destination isn’t reachable, the packet gets dropped instead of potentially looping within the network or being forwarded to your ISP, who will (or should) drop it.

-Terry

_____________________________________________________________________________________________

Re-posted with Permission 

NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html

infoblox-logo

Leave a Reply

 

Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.

 

Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.

 

John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.