What is the difference between “plain” BFD and BFD echo, also what is the “BFD slow timer” for? That’s BFD as in Bi-Directional Forwarding Detection — what did you think I meant? I have been looking at the Cisco documentation and Googling occasionally for a while now, to try to figure this out, and over that time I found the Cisco documentation I had encountered to be not very helpful on this topic. It tells me how to configure BFD echo and the slow timer but not why. Wikipedia was winning my “most lucid documentation” contest for Cisco BFD. However, I have now found that the Cisco documentation has apparently been updated (and/or I noticed the updates and found better Cisco documents on BFD). In particular, the tech writer / engineer who improved the prose on this in the Nexus documentation deserves a bonus (or at least mention in print).
There is also RFC 5880, Bidirectional Forwarding Detection (BFD), which has been available since June 2010, and like most RFCs, it is fairly readable. I guess I should have RTFRFC (Read The Fine RFC), instead of looking for the Cliff’s Notes (so to speak).
Let’s back up and look at why we might care about BFD, then look at how best to do so.
BFD originated with Juniper Networks. It provides a fast way for routing neighbors to detect that their peer is down. BFD can use millisecond timers for communicating with routing neighbors. It is advisable to use interface dampening with that, to minimize the impact of a flapping interface on routing and CPU.
Compared with e.g. sub-second OSPF hellos, BFD is at a lower level in the protocol stack and is lighter weight for the CPU. That allows BFD to be used on more interfaces (physical or logical).
BFD also provides a common interface down event detection mechanism that can be shared across routing protocols (including static routes and FHRPs like HSRP). BFD can be useful with EIGRP, because various Cisco documents recommend not setting the EIGRP Hello timer below 2-4 seconds — EIGRP can reportedly become unstable if you do. Whereas EIGRP should work well with e.g. 50 msec BFD timers, as that de-couples the interface down detection from the protocol hello / adjacency maintenance mechanisms.
Where BFD is especially useful is when there is a Layer 1 or Layer 2 device between your edge router and your carrier, especially if the device does not reliably pass along link status to your router. The device might act like a media converter, copper to optical. Or it might be some form of Ethernet to SONET or other carrier-grade edge device. Or it might act like a L2 hub.
In such cases, you might have to wait for your routing protocol hellos (EIGRP, OSPF, or BGP probably) to time out. Since that takes tens of seconds, some time will go by before your routing can reconverge and use an alternative link or carrier that is still up. BFD allows your edge router to very quickly learn about the loss of its neighbor and react.
Note that such rapid reaction is a bit at odds with SSO/NSF graceful restart behavior, which is more about a calm measured (delayed) approach to allow a second supervisor to take over from the first. With SSO/NSF, the idea is to “ride out” the transition. If you like analogies, SSO/NSF is like a router tranquilizer, calming it down, whereas BFD is like a double-shot expresso coffee, making the router more edgy and hyper. The two are somewhat diametrically opposite in what they’re trying to do. See alsoone of my prior blogs titled Non-Stop Forwarding and Fast Re-Routing, at https://netcraftsmen.com/blogs/entry/non-stop-forwarding-and-fast-re-routing.html
By the way, if there is no intermediate device, and the media is Ethernet, you can set the carrier delay. See for example http://www.cisco.com/en/US/docs/ios-xml/ios/interface/command/ir-c1.html#GUID-7ED1B93D-93F7-425A-8628-D48EC51679EC.
Carrier delay is the delay before considering an Ethernet interface to be up or down, sort of simple dampening. It is useful with direct point-to-point Ethernet links – it can be set to a low value to speed failover. It is not useful when there is a L1 or L2 device in between the router peers.
BFD apparently started out based on a polling (“asynchronous”) approach using control packets. One router polls the other and get a quick response back. The challenge with this is delay in waking up the BFD process to send a reply, causing variable jitter in response. If the other end is slow responding and BFD triggers a link down, that’s not good. Backing off on aggressive timers to prevent that from being a problem somewhat defeats the intent of BFD.
BFD echo solves that, and provides a clever way to take some delay out of the above process. The newer Cisco code defaults to using BFD Echo mode to verify bidirectional connectivity, to take advantage of this.
The Nexus documentation now says
“The BFD echo function sends echo packets from the forwarding engine to the remote BFD neighbor. The BFD neighbor forwards the echo packet back along the same path in order to perform detection; the BFD neighbor does not participate in the actual forwarding of the echo packets.”
The RFC says something similar. The key point (as I understand it) is that the BFD echo leverages the fast / hardware forwarding path on the neighbor to get the echo packet returned to itself without waiting for an interrupt and special handling by the CPU.
The documentation goes on with “Also, the forwarding engine tests the forwarding path on the remote (neighbor) system without involving the remote system, so there is less interpacket delay variability and faster failure detection times.“
Yup, fast / hardware forwarding path. And in other words, you can have tighter timers because you don’t have to wait as much for the neighbor to respond.
Finally, “BFD can use the slow timer to slow down the asycnhronous session when the echo function is enabled and reduce the number of BFD control packets that are sent between two BFD neighbors.”
That is, BFD echo can go fast without interrupting the CPU, and since that will detect an outage, you don’t need BFD control packets running as often, since the control packets aren’t being used for the rapid detection function. That in turn lightens the CPU load and allows more use of BFD. Clever!
The Cisco implementation of BFD echo negotiates the appropriate timers, making it more administrator-proof (and lower maintenance). Details can be found at http://www.cisco.com/en/US/technologies/tk648/tk365/tk480/technologies_white_paper0900aecd80244005.html
Notice that the above also explains the role of the slow timer. It is the timer driving the BFD control interaction, not the pacing of the echo packets.
By the way, both ends can send BFD echo, or you can have only one end sending the BFD echo. The latter approach is referred to as BFD asymmetry.
If you look closely at RFC 5880, it does not specify the actual encapsulation for BFD. For single-hop situations, RFC 5881 applies:
“BFD Control packets MUST be transmitted in UDP packets with destination port 3784, within an IPv4 or IPv6 packet. The source port MUST be in the range 49152 through 65535.”
Cisco BFD follows that specification, per various Cisco documents.
BFD Best Practices
I haven’t found any Cisco document on this yet, so this section will be short! Here are my thoughts about BFD best practices:
Do use BFD echo if you can.
Do back off asynchronous polling with the slow timer command.
Do use interface event dampening. The default timers look pretty good. The idea is it is best to defer having routing consider an interface to be up if the interface has bounced down/up/down in a rather short period of time. If you don’t do that, routing, particularly OSPF, may be doing a lot of reconverging and flooding, and you may be forwarding packets 50% of the time and black-holing them the rest of the time while the routing is churning.
It is a good idea when attempting fast convergence to also be doing significant amounts of route summarization. The fewer routes, the faster all routing related scans and calculations can be performed.
Two somewhat useful Cisco documents about BFD:
Denise Fishburne’s blog on BFD at http://www.networkworld.com/community/blog/bidirectional-forwarding-detection-bfd-–-little-about-timers-0
Cisco interface dampening: