In my quest to really understand SDN, I’ve been reading a number of research papers and watching presentations by industry researchers. It has been quite educational and I thought that it would be useful to share the references.
I’ve found that the traditional three-tiered network design (core, distribution, access) has been replaced with designs that can provide full bandwidth (i.e., maximize the bisectional bandwidth – see below) between any two nodes with a nominal amount of extra network equipment. These network designs do not rely on oversubscription. This means that workloads can be placed anywhere within these network designs without concern that an uplink will become overloaded.
You’ll encounter the term “bisectional bandwidth” or “bisection bandwidth” in a number of the papers that I’ll reference. It refers to the minimum bandwidth between two halves of a network for any arbitrary division of that network. See Wikipedia Bisection Bandwidth. You can think of it as the minimum bandwidth available between any two arbitrary nodes in the network. Obviously, a network with high bisectional bandwidth will have better overall performance for any arbitrary work load.
The first paper is the Cisco Massively Scalable Data Center (MSDC) Design and Implementation Guide. It discusses the traditional three-tiered network design and describes the Leaf and Spine network topology, which is actually a Clos Network design (see below). There is a discussion of the type of blocking that can occur due to oversubscription in a traditional design, then it describes how a Leaf-Spine (the MSDC doc also calls it a Folded CLOS design) network avoids the blocking. It also discusses other types of networks, such as Fat Trees. I found it to be good reading on the basics. Web searches turned up other interesting papers and blog posts about Clos topologies and their characteristics.
What is a Clos Network and a Leaf-Spine Network?
Charles Clos was a mathematician who created the theory of a non-blocking topology that allows full bandwidth between any two arbitrary nodes. A Leaf-Spine network is a representation of a Clos network. See Ivan Pepenljack’s post Full Mesh is the Worst Possible Fabric Architecture to see how a Leaf-Spine network is just a Clos network, only pictured differently. Note that there are instances in which an existing flow between two nodes must be moved to another path in order to provide full, non-blocking bandwidth for a new flow between two other nodes. But there will always be a set of paths that allow any pair of nodes to communicate at the node’s full link speed. I will use the term “Leaf-Spine” in the remainder of this blog for consistency and to match the term that seems to be more generally accepted.
Next on my reading list was a pair of papers from Stanford and Berkeley: On the Optimality and Interconnection of Valiant Load-Balancing Networks and Designing a Predictable Internet Backbone Network.
These papers describe using a slightly overbuilt Leaf-Spine design with random distribution of the traffic between the spine nodes to allow an arbitrary traffic mix to be supported at extremely high link utilizations in the face of network device and link failures. If you look at most three-tier networks, there are conditions under which congestion at aggregation points prevents support of some traffic mixes. In general, the overall network utilization of three-tier networks tends to be relatively low, in order to handle the peak mix of arbitrary traffic loads. QoS is often used to handle the congestion that results from these bursts. One of the key pieces of technology is Valiant Load-Balancing (VLB – see below), which distributes the traffic over multiple spine switches in order to achieve full cross-sectional bandwidth.
What is Valiant Load Balancing?
Valiant Load Balancing is named after Leslie Valiant, a professor at Harvard University. His paper, A scheme for fast parallel communication, (abstract is here) describes the load balancing mechanism and is referenced by a number of the other papers I read. It is an ACM (Association for Computing Machinery) paper, which requires $15 payment to access. The distribution of traffic on the spine switches of a Leaf-Spine network is a form of Valiant Load Balancing (VLB).
Data center networks based on the three-tier (Core, Distribution, Access) design should be updated to the newer designs when the next design opportunity presents itself. Fortunately, the replacement doesn’t have to be done all at once. The CLOS design can start with part of a data center row and gradually expand, easing the transition to the new designs. I rather suspect that the flexibility and efficiency of the new designs will accelerate the pace of conversion in some businesses.