Understanding Performance Routing (Pt. 1)

In a nutshell, PfR is an alternative way of routing packets. Ordinarily, we use routing protocols to determine the shortest loop-free path from source to destination — that is, in fact, the primary goal of all routing protocols: to find the shortest (i.e., fewest number of router hops) loop-free path to the destination. Since many protocols use link bandwidth as the way to determine cost, most of the time the shortest path is also the highest bandwidth path to the destination.

Sometimes, the highest bandwidth path is not the best one. The highest bandwidth path can be experiencing congestion – perhaps it is overloaded. Or, the highest bandwidth path can be experiencing a fault that limits throughput or drops packets.

Instead of selecting the shortest path, PfR selects a path based on the performance of the path. PfR can measure parameters such as delay, throughput, loss and reachability, among others, to select the best performing path, and route packets accordingly. PfR can respond to transient events or ‘soft errors’, such as temporary congestion, and route traffic through an alternate path.

We’ll start with a simple example.

Let’s suppose you are a well-connected home office with two different Internet providers. Perhaps you have DSL and a cable modem. You have two Internet providers because, well, occasionally there are service interruptions, but you want to stay on line all the time. You also have an IP phone which is connected to your corporate office.

What happens if one of the paths has a temporary performance problem? Let’s say one of the provider’s networks is experiencing congestion, so that the delay increases or packets are dropped. As long as the problem is not too severe, normal routing protocols will still see the link as a viable path. Since your router is plugged into the cable or DSL modem, the router interfaces will stay up even if you lose your connectivity through one of your providers

With PfR, the router will route packets on the best performing path. So if the path via ISP A is experiencing temporary congestions, packets will be routed through ISP B.

How it works:

PfR is a Cisco IOS feature, so it runs on nearly all routers (6500s and 7600s have limitations – we’ll discuss them later). There are two components to PfR: a Master Controller (MC) and a Border Router (BR). In a typical installation, there is one MC and one or more BRs. In many smaller designs, you can have the MC and BR on the same physical router.

The MC is the brains of PfR. The MC receives performance data from the BRs, compares it against the configured policy, selects the best route for the data and sends commands to the BR to forward traffic to that path.

The BR is the foot soldier of PfR. It collects the performance data and sends it to the MC. The MC compares the data against the configured performance policy. If it is out of policy, the MC sends commands to the BR to change the data path.

Operationally speaking, you create a “policy” – a set of performance parameters.that defines the minimum performance requirements for your data (e.g., delay must be less than 50 mS and maximum packet loss be less than .01%). The policy can apply to all traffic, or only certain application traffic (such as voice). The MC compares the performance data it receives from the BR against the defined policy. If the data exceeds the thresholds, the MC sends commands to the BR to switch the data path.

Functionally speaking, PfR measures link performance either passively (i.e., quietly) or actively (i.e., by generating test traffic). PfR passively monitors the link performance (using Netflow) to gather traffic statistics. Specifically, PfR monitors TCP flows since delay and reachability can be inferred from TCP flow statistics. Netflow is configured automatically when you enable passive monitoring – you don’t have to manually do it.

PfR can also actively measure performance. PfR uses the IP SLA feature of IOS to generate test probes. As in the passive mode, IP SLA is configured automatically, except when measuring jitter or MOS. In those cases you must manually configure IP SLA responder on the remote ends.

By default, PfR uses both active and passive modes. Normally, PfR passively monitors traffic, but when traffic is out of policy, PfR begins to generate test traffic to monitor the links.

Sow how does PfR actully control data flow? PfR can use either BGP or static routing. I’ll concentrate on static routing first, since that is more likely to be used.

For static routing, you first create a “parent” static route for each of the possible paths. The parent route defines all the possible destinations that are reachable via that path. For example, if you have two Internet connections, you would create two, equal cost, default routes each pointing out a different exit path (This is the sort of configuration you would use if your were not using PfR.).

Now, when PfR is configured, the BR collects data on each data flow (using Netflow) and sends it to the MC.

Let’s assume that, due to some problem at ISP-A, packets are being dropped on flows to Site 1. The MC detects this condition and sends commands to the BR to add a static route, directing traffic through ISP-B. The static route has a longer mask than the parent route (by default, PfR uses a /24 mask). So the MC would tell the BR to add ip route 3.3.3.0 255.255.255.0 2.2.2.1.

Now, traffic to site 1 will use ISP-B instead of ISP-A. The MC will now begin sending probes through ISP-A to test the path. If after a period of time, the conditions improve, the MC will command the BR to remove the static route.

In this way, the MC adds specific static routes to the BR to direct traffic through a particular path based on the path’s performance. This allows your network to adjust to temporary conditions and route around problems.

In the next article, I’ll go into more detail on setting up policies and how PfR interacts with your internal routing protocols.

[Note: the Spet. 22 and 23 C-MUG meeting will discuss Performance Routing. For more details and registration information, please review the C-MUG User Group page.]

Leave a Reply

Related Topics