I’ve seen gradually increasing interest in Cisco’s FabricPath technology, so it seems time to talk about designing for FabricPath. I’m going to provide some opinions and overview, and then point you at some CiscoLive 2012 presentations. I see little point to re-inventing details that have been done well elsewhere, and do hope that I’m helping by pointing out resources people might not be aware of.
What is FabricPath?
FabricPath is a routed alternative to Spanning Tree Protocols (STP) and Cisco virtual port-channel (vPC) technology. Reasons for using FabricPath is that it provides routed protections against spanning tree loops, keeps all links forwarding (unlike STP), and is a bit easier to configure than vPC, especially as your datacenter becomes larger. The arguments against FabricPath: it might not yet be mature, and it is Cisco-proprietary, whereas TRILL will be the standard for FabricPath-like behavior.
Scott Lowe has a nice basic writeup about FabricPath at http://blog.scottlowe.org/2011/07/12/brkdct-2081-cisco-fabricpath-technology-and-design/
What Does Fabric Path Do?
FabricPath does MAC-in-MAC encapsulation to transport Layer 2 frames across a FabricPath network. The transport is based on routed forwarding to another FabricPath switch. As with SAN FSPF routing, the FabricPath routing is a link state protocol which tracks how to get data to one of the participating switches.
When a L2 frame comes in a “classic Ethernet” port, a LAN MAC switching lookup occurs. If the switching lookup indicates that the destination MAC is reached via FabricPath, it will also indicate which FabricPath edge switch to send the frame to. That switch ID can then be looked up in the FabricPath routing table. A path is chosen, the frame is MAC-in-MAC encapsulated, and routed over to the destination FabricPath switch. That switch de-encapsulates the frame and forwards it in normal L2 fashion.
FabricPath allows for multiple “topologies”, i.e. separate layers of FabricPath operation. It also does multi-pathing, up to 16 paths, each of which can be a 16-fold 10 Gbps port-channel. FabricPath uses a time to live (TTL) to protect against short-lived or other routing problems (bugs?) that might somehow cause a routing loop. The underlying routing is based on IS-IS, as is TRILL. (Brocade used the program code they have, so their TRILL implementation is based on FSPF.)
Why FabricPath not TRILL?
FabricPath appears to have scaling benefits compared to TRILL. One is conversational learning, i.e. an edge device learns MAC address / switch mappings only for the MAC addresses some locally-attached system behind the FabricPath edge device is actually conversing with. The edge devices do not learn all source MACs seen via ARP flooding. Per the article at http://lamejournal.com/2011/05/16/layer-2-routing-sort-of-and-trill/, it sounds like TRILL can optionally learn all MAC addresses from edge devices. This seems rather undesirable to me. The article compares Cisco OTV, which tracks reachability of MAC addresses. Fair enough, that may be a limiting factor for OTV. Which begs the question, if I’m criticizing TRILL for promiscuous MAC learning, shouldn’t I do the same for OTV? Probably.
FabricPath allows for vPC+, which enables dual-active FHRP behavior at the edge. This is useful for scaling up routing off off FabricPath VLANs.
FabricPath peers only on point-to-point links. To me, that’s a distinct plus for bandwidth tracking and preserving the routing model end-to-end. I see only risk from having switches interconnecting FabricPath routing peers.
Other than that, the web seems to have a lot of noise but little signal on the FabricPath versus TRILL topic.
I’m amused by what I’m seeing in print. Most FabricPath designs show a spine-edge approach, as in the following diagram.
Note: the heavier links are dual-link vPC peer-link port-channels, drawn this way to reduce visual clutter.
I like this design. It is a CLOS fabric, an optimal structure for maximizing bandwidth between arbitrary (or selected) endpoints. If you want more bandwidth, you can either add links, or add spine nodes. If you start exceeding the 16-fold multi-pathing limit, you can port-channel links between the same switch pairs to add bandwidth without pushing beyond 16 paths.
What we do is turn the middle into a FabricPath routed domain. We do that by configuring the interfaces shown in red to be fabricpath links.
In case you’re wondering, the top and bottom pairs don’t need to be interconnected directly, since the purpose of most datacenter networks is to support either North-South (user to server, top to bottom) traffic, or server to server (East-West, left to right) traffic. You go from left to right across the top or bottom by taking one intermediate routed hop in the diagram.
The other configuration step is to specify which VLANs are connected across the FabricPath “red zone” above. And to configure a low root bridge priority on the FabricPath switches, making them all equal as root bridge. In effect, the switches and red links above form one giant root bridge switch, interconnecting whatever edge switches are not shown at the bottom of the diagram. The following diagram may visually suggest that better:
Concerning TRILL design, a small percentage of what I’ve seen seems to have diagrams like the above. The rest seem to be thinking based on Radia Perlman’s RBridge concept, which I would describe as “oatmeal with raisins” — a gluey blob of Layer 2 oatmeal with RBridge “raisins” scattered throughout. For various flows, different RBridges forward between VLANs. How you troubleshoot that sort of design is what puzzles me, since it seems like you have a Layer 2 and encapsulated routing mix where it might be challenging to identify which device encapsulates a given flow, also requiring lucid thinking and good understanding of Layer 2 forwarding and TRILL.
So maybe those lumpy diagrams are just conceptual and nobody really intends to do TRILL that way? Brocade does have pictures that look mighty familiar (and structured): http://www.brocade.com/company/news-events/newsletters/BA1209/0912_technology_showcase.html. Juniper doesn’t like TRILL, but shows a structured diagram as well, in http://www.juniper.net/us/en/local/pdf/whitepapers/2000408-en.pdf.
Congestion is easily managed in the above diagram, in the sense of monitoring a relatively small set of links between spine and edge, and adding bandwidth where needed. Load balancing should take care of un-evenness, unless there are small numbers of flows of vastly different magnitude.
Migrating to FabricPath
One of the drawbacks to Juniper’s QFabric is that it is apparently all-or-nothing. You can start with a small QFabric and then expand. If you buy it and don’t like it, what’s your alternative?
I see FabricPath as being incremental. You can migrate vPC edge pairs to FabricPath one pair at a time. So you might try something like a FabricPath to a pod with two Nexus 5500’s and some servers, and then gradually dial up the size of the FabricPath domain.
There was a good talk at CiscoLive 2012 on this topic. It has a lot of diagrams, includes a couple of things I hadn’t thought about (not that I’d worked through a FabricPath migration in detail), and includes cutover timing information so you can plan how long each step should take. The presentation can be found at https://ciscolive365.com/connect/search.ww#loadSearch%searchPhrase=fabricpath&searchType=session&tc=0&value(profileItem_10017)=10173. It includes topics like moving your vPC peer link from M1 ports to F1 ports to support FabricPath and vPC+. (That was session BRKDCT-2202. Also, session BRKDCT-2081 may be of interest for more fundamentals, e.g. how FabricPath works.)
In general, CiscoLive 365 (Virtual) sessions are at https://ciscolive365.com/connect/search.ww#loadSearch%searchPhrase=&searchType=session&tc=0. Registration is free this year, as far as I know. And the San Diego CiscoLive presentations do seem to already be posted!