Thanks to Bob Harper for his follow-up emails concerning my CMUG presentation on vPC, FEX, and Datacenter Virtualization. If you haven’t read the presentation, it can be found at Follow-Up to vPC, FEX, and Datacenter Virtualization CMUG. (Thanks to Cisco for permission to use and even re-arrange a ton of their slides to convey my message.) I’m posting an edited version of our discussion thread with Bob’s permission, in the hopes it’ll be of interest and perhaps stir up some debate. Or at least provide amusement.
BH: I attended the February CMUG: Nexus 1000v/1010v and VM-FEX technologies. Thanks again! It was a great discussion about all the illusive FEX technologies that are coming.I totally agree with you points on HP’s 802.1qbr. It does not look to, or really solve, virtualizing x86 network components. I am looking forward to adapter-FEX technology and am eager to see how it plays out. I had one question for clarification from the beginning of the the CMUG presentation. Did you say that we should support the ‘any server anywhere’ network design model? Or should we have definitive ‘edge | DMZ | TestLab | DataCenter | Campus’ networks. In other words inter-VRF routing across multiple physical routers in lieu of traditional layer 2 segmentation? There are of course multiple caveats for doing inter-VRF routing like multi-tenate networks, etc.. |
PJW: I hate to say it, but “it depends”. I also am not sure I’m fully understanding your question. I want L2 contained strictly within the datacenter, since I consider it a major risk. OK, OTV or something between datacenters, I expect that to turn out to just move the risk goalposts (i.e. lower risk but bigger outage when it hits the fan?). [Ed: to mix metaphors wildly.] I think we need to be prepared for any server anywhere in the datacenter. In a large setting, I might want to partition the datacenter, but it depends on politics and how much one can control server placement, etc. [Added later: That might be like King Canute and the tide, of course.] I never ever want L2 to closets from the datacenter, nor servers in closets. I prefer L3 to the closet — costs for L3 licensing (unless doing EIGRP stub), but well worth it in that closets never a (STP) problem again. May not be possible in hospitals with apps that require L2 adjacency. [Added later: you can do L3 to the closet cheaply (pseudo-L3 to the closet) by having one-closet VLANs that terminate at the distribution layer in L3 SVI’s, and do NOT extend the VLAN between the two distribution switches — only a good idea if you have the discipline to keep the VLAN to a single closet, if it expands, you’ll be in danger of black-holing packets.] DMZ, Lab / Dev probably need isolation. L2 VLANs (everywhere in datacenter) might do that. L3 VRF-Lite does it more strongly, but imposes constraints on VMotion etc. (but not VXLAN!). Routing into/out of VRF-Lite requires good design or it gets messy fast — we think we’ve got a good design using VRF-Lite for example for one major hospital which needs its FW-protected Epic servers in datacenter A to be able to route to similar ones in B without passing through a firewall. If I have multiple edge “perimeter modules”, I have repeatedly considered VRF Lite to isolate the many partner routes from main routing, particularly in OSPF environments. For some reason we rarely seem to end up doing it that way in practice. I’ve also considered running EIGRP for the perimeter as a simple way to keep the perimeter routes out of OSPF in the core — just originating default into OSPF instead. [Ed note: And for some reason, like complexity or ugliness, my NetCraftsmen-peers didn’t like those approaches.]Added later: The recently re-announced/announced Cisco Easy Virtual Network (EVN) might be an easier way to tackle this than VRF-Lite, in part because it addresses the routing into/out of issue more cleanly. From some MPLS designs I’ve seen, I’m going to guess that using that heavily could turn around and bite you, complexity and security-wise.] |
BH: Thanks for the prompt and thorough reply. It always depends. I feel that when possible, KISS is a great network design philosophy. Network hardware is cheap when compared to L2 outages and troubleshooting costs. Just because you can does not mean that you should.[Added later (PJW): Exactly, emphasis mine!] |
PJW: That’s about where my thoughts are. OTOH, L2 is here, to me it’s a matter of where you should and where you CAN draw the line. And trying to get a dialog started that accommodates everyone’s needs and concerns. |
Follow-up thoughts:
That “just because you can” resonates for me. I repeatedly see VLANs all over creation, or tunnel spaghetti, sometimes both. I do see a lot of datacenter sprawl and chalk up some of it as entropy — and some of it due to lack of planning, and/or server folks or management either not receiving the message or not understanding the risk trade-offs for the convenience. If networking were a card game, apparently convenience would be trump suit every time!
What do you think? Is it even possible to exert some control over VLAN sprawl in the datacenter? Is it useful? Or is it just burning time being neat, tidy, and safe when those don’t buy you much in the real world? (Says those who have yet to be burned?)
While we’re at it, it seems like OTV solves some real-world problems for people. Are we at risk of extending our VLAN sprawl to multiple datacenters? What can or should be done about that, if anything?