Yes, I’ve been lurking and watching discussions of SDN (Software Defined Networking) for quite some time. I’ve been quietly assessing what I read and hear, what I believe, where I’m skeptical, and so on. The level of chatter is deafening, in terms of time to read articles or watch videos, so I try to be picky about what I read / watch. Due to all the VC flowing into this potentially disruptive technology, there’s both a lot of content out there, and a lot of hype.
One thing that has concerned me is that some of the technology described seems to violate performance and scaling limits by several orders of magnitude. Another somewhat related concern is that some of the technology seemed to be missing the lessons of the past, among them the idea from OpenFlow of matching on a complex n-tuple rather than using labels like MPLS — the latter allowing much simpler switching in the middle of a network. Admittedly, that topic perhaps gets into TCAM capabilities as well, something I have no intention of tracking.
Some recent reading and discussions with Terry Slattery have somewhat changed my perspective.
In particular, SDN is a work in progress, so it isn’t all going to be right the first time around. For some reason, this reminds me of the old saying that 90% of almost anything is crap. See also Terry’s CMUG slides, posted at links off https://netcraftsmen.com/blogs/entry/software-defined-networking-sdn-at-cmug.html. We debated some, and I pretty much agree with Terry. I too particularly like Scott Shenker’s presentation “A Gentle Introduction to SDN”. And Brad Hedlund is usually an interesting read, see the article at http://bradhedlund.com/2013/01/28/network-virtualization-a-next-generation-modular-platform-for-the-virtual-network/
There are some things (at least) I can get pretty excited about concerning SDN’s potential:
- Central configuration of a pile o’ gear (who couldn’t get excited about that)
- Central GUI policy configuration, i.e. one GUI rather than the firewall GUI and the Server Load Balancer / Application Delivery Controller and the WAAS and the QoS tool and the … (you get the idea)
- Overlays allowing me to build a robust L3 partitioned datacenter while providing virtual L2 adjacency
- SDN as a abstraction vehicle simplifying design / buildout
One thing I really get is the idea of service chaining virtual appliances (see the last CMUG I presented, Network Virtualization in the Data Center). And the idea of virtualizing e.g. security and server load balancing tasks. Ivan Pepelnjak’s recent blogs got into how that might work with VLANs and routing (for instance, Virtual Tenant Networks with NEC ProgrammableFlow). The idea there is that flows can do header rewrites, e.g. changing VLAN, or imitating the rewrites a router does (new MAC headers, update TTL, etc.) The next logical step is incorporating drop actions (e.g. firewalling) and SLB type functionality or NAT (IP header rewriting). From that perspective, as a frame or packet crosses the network, it is transformed in various ways, can we / should we combine those functions?
There are also some SDN / OpenFlow-related things I don’t get so excited about:
- L2 outside the datacenter (since L2 is pretty much unnecessary there, with the occasional exception)
- IP multicast tunneling (VXLAN) (is your vendor’s IPmc forwarding as robust as their unicast routing?)
- Packet replication (NVGRE, unicast-based VXLAN)
Why is a mostly L2 campus a good thing? (If you say “security”, I’m going to say “ISE / security group tags”, or ID-based traffic flow controls.) And if we’re mostly going to be doing 801.11ac wireless for access in 5 years, why invest in new technology for a part of the network that may be about to diminish radically? Yes, data centers are different beasts.
I can see using VXLAN for multi-tenant and situations where speed of deployment / development is the driving concern. For others, I would presently lean towards doing FabricPath and letting the hardware do the work without the extra mapping. That is, add the abstraction where it buys you something, but recognize the inherent increase in complexity and troubleshooting difficulty. And don’t do it where it doesn’t buy you much, and widens the server to network staff communications (documentation?) gap. (“It’s in the VMware or other GUI, I don’t need any documentation.”)
Getting Physical
There are also some topics I’d like to see discussed in the context of SDN. Maybe I’m not looking in the right places, or maybe I’m looking at the hard stuff that represents later steps in the solution. (Hush, don’t talk about the hard stuff, you might scare away the VC funding.) Or hung up on the problems of the past.
Is network virtualization rather different than server / memory / disk virtualization? Network cabling and devices are distributed, making them inherently different than chips and chassis or rack level virtualization. Long-haul fiber isn’t as reliable as in-datacenter fiber — something that concerns me when talking about the wisdom of L2 clustering between datacenters. Physical switches aren’t as reliable as single host virtual switches. Port queues, QoS and IPmc implementations vary between vendors, and do affect overall performance in a major way. Do we in fact have suitable “generic hardware” yet? Is that on the near-term horizon?
Does SDN or OpenFlow assume a perfect physical world? Abstraction does that, as part of hiding complexity. Does that matter?
Almost every site I’ve used the NetMRI tool at, I / we have found hundreds if not thousands of duplex mismatches and other cabling related problems on active user and server ports. And often application problem resolution seems to boil down to mapping the devices to the physical path and then looking for errors, drops, duplex problems, congestion, and mis-configuration along the path. Which means you’d better have a network management tool to help you spot such problems. Yet many sites get expensive tools where the licensing precludes ubiquitous coverage of ports and interfaces, CPU’s, disks, etc. Oops!
Is this a hidden pre-requisite for SDN or OpenFlow in the near term?
I’m also a bit hung up on mapping virtual to physical. Tracking customer services back to physical servers and then to switch ports can be time-consuming. Add in VMs doing VMotion, which devices and links the flows between them were mapped to at the time of the problem, and the complexity level goes way up. Dare I hope for good tools to help people deal with that? Right now just the simple physical server problem can take a lot of time to track down. (I shouldn’t gripe too much, people ask for our consulting help when the complexity defeats them.)
All that is in principle solvable. But darn costly to program?
Suppose you have this wonderful SDN overlay and/or central control tool. Can it tell that the physical link from device A to B is dropping 1% of all traffic? Or is on a switch blade that is 8:1 oversubscribed? Or that el cheapo switch C reboots every 24 hours and really needs a software upgrade?
Ideally, such problems would be known and factored into the flow logic — as in, taken out of the active link topology. Or should they be?
Maybe that just says you should be monitoring all that stuff and proactively fixing physical layer problems.
Closing Thoughts
When I think about it, Cisco is doing a lot of experimenting and learning, in terms of products embedding a lot of this stuff. In particular, the Nexus 1000v seems to be somewhat paralleling the direction VMware and the Nicira acquisition are taking. (I’m not going to attempt to call that particular context.) Their Meraki, Cariden, Cloupia, Broadhop, and some earlier acquisitions share the central control theme in various forms.
I’m not sure people give them enough credit for that.
I’m looking forward to Network Field Day #5 next week, presentations, sharing thoughts, and interesting debates. And who knows, maybe some insider scoop on cool new technologies!
Disclosure
Since the vendors for NFD 5 are paying my travel expenses and perhaps small items, I need to disclose that in my blogs now. I’d like to think that my blogs aren’t influenced by that. Admittedly, the time spent in presentations and discussion gets me and the other attendees looking at and thinking about the various vendors’ products, marketing spin, and their points of view. I intend to try to remain as objective as possible.Stay tuned!
Pete,
I’ve been a big fan for most…no, scratch that…ALL of my networking career. I figured that you’d have retired by now man. 🙂 Glad (for all of our sakes) that you’re still hanging around teaching us new things.
This was an excellent read. Thanks for sharing.
Given something you mentioned, I wanted to share as well…
Aerohive is just now introducing Identity-based, unified policy routing across wired (our switches and routers) and wireless (our APs and wireless routers). It’s neat stuff. Give me a shout if you want one of our product managers to go through it’s functionality with you.
Thanks,
Devin Akin
Chief Wi-Fi Architect
Devin [at] Aerohive.com
Excellent write up Pete. You’ve nailed the current landscape perfectly. Look forward to sharing our solution with you at NFD next week I think you’ll find a lot of alignment in thinking.