I learned something new at CiscoLive: hsrp preempt delay and why it should be used. In the Resilient Campus Design session, the presenter discussed factors and configurations that make a network more resilient. HSRP is often used in campus networks as a first hop redundancy protocol. If one router or interface of a pair that is servicing a subnet dies, then the backup device/interface takes over.
The problem can arise when an HSRP master is rebooted, or the interface is restored after a failure. Let’s say that you’re using low timer values in order to achieve fast failover times and that you’ve configured HSRP to preempt so that you always know which router is the master. Both of these are good practice in resilient networks.
The Layer 2 connection between the HSRP pair comes up quickly. That’s a good thing normally. But what if the HSRP preempt takes effect prior to the routing protocol converging? You’ve now created a black hole. Traffic will go to the HSRP master, but it doesn’t yet have full routing information. It may take several seconds for OSPF or EIGRP to exchange all the routes. If you have a significant amount of summarization configured, such as in a totally stubby OSPF area, then you may get lucky and have a small duration black hole. But if there are a lot of routes, the OSPF database exchange will take much longer than the HSRP master transition.
So the recommended practice is to configure HSRP to delay the preempt action until after the routing protocol has a chance to stabilize. This might be as little as a few seconds. The presenter suggested pretty large values, like 30 seconds or more. It depends on your network and the volume of routes that need to be exchanged as to how long it might take to transfer all the routes. Add the delay to your HSRP configuration with the command
standby delay minimum 30
Delaying even longer may not be such a bad idea if there is a chance that the interface or router is unstable and may go back down within a few minutes. The CiscoLive presentation suggested that values of 300 seconds are not unreasonable.
For further reading check out my blog post on the CiscoLive 2009 presentation on High Availability Networks (>5-nines). I didn’t attend the HA Networks presentation this year since I attended it last year. Check out the other presentations at CiscoLive Virtual for more tips on resilient and high availbility networks.
-Terry
_____________________________________________________________________________________________
Re-posted with Permission
NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html