This article continues the discussion started in a prior blog titled Configuring the Customer Side of an MPLS VPN WAN, Part 1 (of course). It can be found at https://netcraftsmen.com/blogs/entry/configuring-the-customer-side-of-an-mpls-vpn-wan-part-1.html.
In that article, I listed three flavors or types of MPLS WAN customers (scenarios):
- Legacy WAN with IGP (EIGRP or OSPF) routing plus Layer 3 (routed) MPLS VPN WAN
- Dual-carrier L3 MPLS VPN WAN
- Legacy + Single, or dual-carrier L2 MPLS VPN
The first article talked about the first of these: mixing a legacy WAN with IGP and MPLS VPN WAN, specifically what the routing implications are.
This article looks at the other two scenarios. We’ll also talk briefly about IP multicast in these settings.
Dual-carrier L3 MPLS VPN WAN
With dual carriers, life is actually a good bit simpler — unless you go out of your way to make it hard.
The simplest approach is to decide how much you like BGP. If the answer is “not very much”, then try to get the carriers to both provide you with your favorite IGP, be it EIGRP or OSPF. This may involve a perceived quality versus choice of routing protocol tradeoff on your part. There’s a serious business rationale for chosing to use IGP: it integrates seamlessly with your IGP, no new skills required. Possible counter-argument: perceived risk, some carriers really manage to give the impression they don’t want to do it or don’t have any customers doing it.
If you’re a BGP aficionado, well, doing dual carrier EBGP for MPLS WAN will give you more BGP than you may have bargained on. Seriously, consider how many people on staff know BGP, how well it’s going to integrate with your IGP, etc. If part of your answer is “it’ll help me prep for my CCIE”, then just don’t go there.
Don’t mix IGP with one carrier and EBGP with the other. See Part 1 for why. The short answer is “load balancing will be very hard”. See the following diagram.
If you do decide to do dual EBGP, you have some other choices to make. If you have two routers at each site, then do you do IBGP between them? Do you use the same or a different AS number at each site (carriers generally support either)? Do you use one AS number, say 65000, for the routers connecting to carrier #1, and another, say 65001, for the routers connecting to the other? However, we’ve seen a couple of enterprises who needed different private AS numbers, in some case for international MPLS where they had to do some regional filtering or routing policy. For instance, MPLS WAN in China, MPLS WAN in US, a couple of long-haul connections between the two, mix of technologies.
I personally like one AS everywhere — it’s simpler. So unless you think you’ll have routing policy (steering traffic to a preferred link) needs, or another reason, just go with one AS number at every site. Your provider should know how to make that work — and if they don’t, you need a more clueful carrier.
The next question is site routing, and do you route between the two Customer Edge (CE) MPLS-connected routers? I did some lab work recently, and ended up choosing to bidirectionally redistribute EBGP into EIGRP and vice versa, with route tagging. As routes were redistributed from EBGP into EIGRP, they were tagged using a route map. The redistribution the other way was then controlled by redistributing only the untagged routes, which were the site prefixes. This has the major advantage of not having to build a prefix list or access list (ACL) for each site, of all the site prefixes, and then maintain those lists as sites changed. And was only about 8 lines of configuration. See the following diagram:
Here’s what the configuration for that might look like (lab tested):
Note: you do need to consistently apply the tags, or a high-bandwidth site may become transit for traffic between the providers.
Another consideration is failure modes. You might omit the filtering above at a main high-bandwidth site deliberately. That way, the site can be used for transit, e.g. if one site gets disconnected from carrier #1 and another from carrier #2 and there is traffic between the two sites. On the other hand, if you have a main site advertising default, you don’t need to pass prefixes between the providers — if a prefix goes missing from one WAN cloud, default (and “ip classless”) will cause packets to go to the main site, which is equally well informed with prefixes from both carriers. You can use a default route with worse metric for a “backup main site”.
I also tried IBGP between the two CE routers, and using an AS PATH filter that only allowed local routes (those with no AS number in the AS PATH) out. That too worked fine — but I was still redistributing bidirectionally or doing site-specific network statements, due to the desire for very granular load balancing (see below). My conclusion is that approach doesn’t seem to buy much. The one place it might help is if you have no site L3 switches or routers other than the CE’s, hence no IGP. The CE’s would then be directly connected to all subnets at the site, and IBGP would provide a clean way to re-route between the CE routers if a prefix is only reachable via one of the two MPLS clouds.
Another thing to consider is outbound load balancing. I’m assuming the WAN links are equally sized here. If you have no other site routers, you can run GLBP, for source-based load sharing, which isn’t bad. If you have large multi-threaded flows between single source and destination, like server backups, you might want full (or “finely granular”) CEF load-balancing with source/destination/port-based hashing. That requires having a L3 switch behind the CE routers, so that traffic from workstations first hits the L3 switch and has a choice of equal cost routes. Note that the CE routers themselves do not consider the route through the other CE to be equal cost, so any of the FHRP’s (HSRP, GLBP, VRRP) isn’t going to get the finely granular job done.
By the way, if you are doing dual IGP, you don’t necessarily need the L3 switch in front of the CE routers to achieve full CEF load balancing. If you run HSRP, you can tweak the delay on the WAN side of the HSRP primary, so that it does see equal cost EIGRP routes. This approach is a very careful balancing act, perhaps a little fragile. So while it is a neat trick, I’m not convinced it’s the best basis to build network routing on.
Legacy + Single, or dual-carrier L2 MPLS VPN
Basically, ditto, except that you get to do your own routing, and so you are almost certainly going to use your favorite IGP.
What changes however is that with L2 MPLS VPN or VPLS you may have a lot of routers on a common WAN subnet. While most IGP protocols have been tested to large numbers of peers (under quiet conditions, etc., etc.), it is not repeat not a good idea for a router to have many peers. I consider 25 to be a good number, 50 pushing it, but YMMV (“Your Mileage May Vary”). The worst situation for such networks is when the carrier fails then all of a sudden you have many routers establishing adjacency and trying to exchange routes. They can get stuck in that state — the workload is much higher than in a steady state.
OSPF has somewhat better behavior for many routers on a LAN, than does EIGRP. The OSPF DR mechanism reduces the amount of adjacency formation and LSA flooding, to some extent anyway.
One alternative is to pool routers (regionally?) by using different VLANs on your L2 “WAN cloud”. That lets you break up a larger number of peers into smaller groupings, scaling better.
IP multicast in MPLS VPN (or VPLS) is as bit of a contrivance. It is not a given, it is a service you have to ask for. The carrier may not provide it, may charge more for it, and may cap the amount of IP multicast (IPmc) traffic.
The methods for a carrier to support IPmc, either for L3 MPLS VPN or L2 VPN are in a state of flux. The older Layer 3 method is adequate for most enterprise uses. With L2 MPLS VPN or VPLS, the carrier may have to limit the multicast traffic to a known amount since their PE device may be replicating frames into many pseudo-wires, a real CPU burden. There’s also the issue in L2 of protecting one customer from another’s bad Spanning Tree day, by capping the amount of broadcast and multicast allowed. (A Spanning Tree loop would flood the customer’s L2 VPN, impacting all the trunks its traffic crosses. It might also make the edge router CPU busy, depending on the nature of the traffic.)
There is a further issue to consider with IPmc over L2 WAN. IPmc on a LAN is not selective — it floods. So while PIM routers on a LAN may be aware and tracking which of them need a given IPmc flow, it gets sent on the LAN to all of them. In the case where the LAN is a L2 WAN VPN, that means any IPmc consumes bandwidth out to every router, many of which may then just discard the IPmc flow.
(Caveat: I’m not tracking this totally closely, so newer developments may have improved the situation. I do doubt that, but there might be a new technology I haven’t noticed yet.)
There are a couple of good books that may give you a better idea of what’s going on under the hood with carrier MPLS. And the IPmc issues and technologies mentioned above. Also, how providers can make BGP work with the same AS number at all your sites, a technique for EIGRP called Site Of Origin, some of the special carrier features to support customer OSPF, etc. (Caution: the material is fairly technical.)
MPLS and VPN Architectures by Ivan Pepelnjak and Jim Guichard (Hardcover – Nov. 10, 2000), http://www.amazon.com/MPLS-VPN-Architectures-Ivan-Pepelnjak/dp/1587050021/.
MPLS and VPN Architectures, Volume II by Ivan Pepelnjak, Jim Guichard, and Jeff Apcar (Hardcover – June 16, 2003), http://www.amazon.com/MPLS-VPN-Architectures-Ivan-Pepelnjak/dp/1587051125/.
MPLS-Enabled Applications: Emerging Developments and New Technologies (Wiley Series on Communications Networking & Distributed Systems) by Ina Minei and Julian Lucek (Paperback – June 3, 2008), http://www.amazon.com/MPLS-Enabled-Applications-Developments-Technologies-Communications/dp/0470986441/
I’ll also repeat a link from Part 1. Cisco has a good posting about buying MPLS services, the Enterprise Consumer Guide, at http://www.cisco.com/en/US/docs/solutions/Enterprise/WAN_and_MAN/L3VPNCon.html. It’s a good source of questions to ask vendors about in an RFP, although I recently thought of a number of things I wanted to ask that might not have been in there. (And there was an amazing first in dealing with WAN provider discussions: we actually got some darn good answers, and I hereby award the prize for best technical answers in a customer document to Level 3.) There is a longer (more detailed?) version available as a book from Cisco Press. See http://www.amazon.com/Selecting-MPLS-Services-Chris-Lewis/dp/1587051915.
And for those who need to think about such things as MPLS Security, you might look at MPLS VPN Security by Michael H. Behringer and Monique J. Morrow (Paperback – June 18, 2005), http://www.amazon.com/MPLS-VPN-Security-Michael-Behringer/dp/1587051834/. The short free version is at http://www.cisco.com/en/US/tech/tk436/tk428/technologies_white_paper09186a00800a85c5.shtml. Ivan Pepelnjak has a good blog about it, titled “True or false: MPLS VPNs offer equivalent security to Frame Relay”, at http://blog.ioshints.info/2009/04/true-or-false-mpls-vpns-offer.html. Basically, the hacker has to get themselves access to the carrier network and also breach core device or configuration tool security. The security issue comes down to how much you trust your provider to comply with well-known security Best Practices in hardening their MPLS core and routers. There are differing opinions on that, in particular since providers seem to alway be under-staffed from any point of view, let alone for “optional” non-revenue-producing activities (which are very costly at such large scales). Deciding you have to encrypt your MPLS traffic certainly increases costs for the customer!