Best practice datacenter design architectures have changed — there are different and new datacenter designs now that can save you money. So why am I telling you this? Two reasons. First, I’ve been seeing a fair number of people / sites proceeding as if doing business as usual, as in “replace my old box with the modern equivalent”. While you can to a fair degree still do that if you really want, you may be missing the point. The second reason is skills, there’s a lot of new technology coming along, and if you aren’t keeping up, how can you evaluate it, let alone be in a position to design for it.
Ok, the “save you money” part is a bit of Cisco (and other vendor) Kool-Aid — but happens to be true, at least in some/most ways. Concerning skills, I’m sub-contract teaching the Nexus class roughly once a month for Firefly, and keeping up on not only Nexus products but the virtualization suite (1000v and related blades, VMWare/vSphere) and other technology (i.e. reading about Juniper QFabric, OpenFlow, VXLAN, and so on). I plan to write about some of it, e.g. VXLAN, shortly. In the meantime, I highly recommend Ivan Pepelnjak’s blog IOS hints blog, also Tony Bourke’s Data Center Overlords (Tony teaches for FireFly). Amusingly, when I searched a bit, I quickly hit upon the specific article link here, which starts out with almost the exact same theme as my starting point: “that data center landscape is changing rapidly”.
By the way, if your organization is grappling with this, and would like to bring me in for a couple of days consulting and discussion facilitation, well, that’s my idea of fun!
The biggest reason “it ain’t just box replacement” anymore is the Cisco FEX technology … and I suppose other vendors’ attempts to “flatten the datacenter”. (Which always struck me as a Bad Thing — isn’t that what a tornado might do?). The Nexus 2000 (N2K) products allow you to do manageable Top of Rack switching. That is, you don’t suffer “death by small switches”, with 10’s to 100’s of small switches to manage, 1 or 2 per rack. At present I’m mildly mixed about the N2232 (4:1 oversubscribed ports) — but imagine the possibility of something like that with 40 or 100 Gbps uplinks. (Another article I intend to write: all the ways Cisco might take the FEX technology.)
But it isn’t just the N2K. The FEX technology is coming to NIC cards with tight VMWare integration. Who currently manages NIC connectivity? Who is managing the VMWare vSwitch or dvSwitch in an ESXi host? How about getting that back into network turf, where people can control it, people who understand the network and security implications of what they’re doing, and where people need the visibility to troubleshoot?Oh, and by the way, let’s offload the switching to hardware, to preserve CPU cycles for applications.
The N7K is a doggone big switch, big enough that for fairly large-sized medium enterprises (up to 10-20,000 people?) the 7010 easily provides enough ports to serve as campus core, distribution, and datacenter core as well. Yes, VDC’s might modularize that, it’s still a lot of eggs in one basket. As speeds go up, one might end up wanting separate campus and datacenter core N7K’s at that scale.
The N5K plays nicely with the N2K to provide the “pod” approach Cisco’s been talking about. I like the Just In Time provisioning aspect: build a pod of 4-8-16 racks using N5K and N2K to minimize and localize cabling. As time goes on, decrease the racks per pod to increase 10 G port density — or use newer N5K models as they become available. This is one place where “saves money” may or may not apply — you can end up with a bunch of N5K’s, which aren’t that cheap. On the other hand, is the total cost more or less than book-end 6500’s on the ends of a long row? It’s hard to tell. Cheaper than high-end 6500’s with all the newest tech trying to get close to wire speed on a lot of 10 G ports. Heck, how does a single 5596 stack up against a 6500 performance-wise?
FCoE has the potential to reduce access layer cabling by half — if your SAN team is will to co-own the FCoE (which can be a barrier). Less cabling = win!
Another item that is easy to overlook: swapping 6-8 x 1 G ports to each server for 10 Gbps ports greatly reduces cabling. Cabling in the first place, labelling, and maintaining cabling plant are all more costly than you’d think (time consuming!). Think about using VLANs and VRF’s instead of re-cabling to new switches, e.g. when server ownership or security zone changes. Wouldn’t that be a win?
I see the UCS as a game-changer too. The memory mapping aspect means more memory per socket, cheaper than HP’s approach, which is limited by standard DIMM slot count to using expensive more dense memory. More memory per socket means more VM’s per blade server socket, hence higher density. Less space, less power overall.
Facilities are changing too. In some recent datacenter tours I’ve been seeing more sites with things like:
- No raised floor
- Heat or cool containment aisles
- A lot more attention to cooling air flow, placement of floor tiles in raised floor buildings, etc.
- All cabling in cable trays
- All power and localized power distribution overhead, in a 2nd layer of trays (sometimes)
- Generally less space for servers, more space taken up by storage arrays
The one thing I haven’t noticed much of (yet) is use of twinax for inexpensive 10 G server connections. And by the way, don’t put your N5K’s at the ends of long rows, if you put a pair of N5K’s something like 8 or 9 racks apart, you can use the much less costly twinax. For a row of 16-20 racks, go with N5K’s say in racks 4, 5, and 16, 17 — that is, break the row in half, and put the N5K’s in the middle or near the middle of the two pods of 8 racks, so that all server to N5K distances are within the 10 meter max for twinax. See the diagram below.
I see we sort of passed over the dark side of this. It’s not that bad, but it’s hard — I’m talking about operational procedures and shared ownership. The technology stovepipes have to become less rigid for this to work optimally. Personally, I see that as a great career opportunity for people — having combined network / server / SAN skills will make you very employable going forward.
Here are a few thoughts about that:
- Fewer boxes = less $.
- Use VLANs and VDCs and other virtualization techniques for security zones, to reduce box count.
- If your site security people or someone insists on separate switches in each row for production, DMZ/perimeter, and backup, have a chat with them. That’s expensive and inflexible!And takes a lot of labor to cable and maintain.
- 10G on twinax copper to servers is a lot cheaper than optical transceivers and fiber.
- The Cisco bundled FET for N2K to N5K or N7K connectivity is also relatively low cost, works with fiber over quite adequate datacenter type distances (25 – 100 meters).
- 10 G versus many 1 Gbps connections — less cable to manage, less money on patch cables, less switch real estate, aggregate power to drive switch ports reduced.
- If you want control over / visibility into how the servers connect, the 1000v can replace a physical switch at less cost. The coming VM-FEX NIC and software should allow you to use an adapter that is in effect a small N2K. HP is already selling a Cisco N2K that goes into their blade server chassis, the “Blade Fabric Extender“. I have yet to compare costs, but I like the idea of a remote-controlled FEX in the chassis (in many chassis) rather than having the server team squandering money on HP VirtualConnect (which has never struck me as very useful, it does the processor to external interface plumbing the UCS manager does intrinsically).
- Cutting down on internal and external hardware and connections: less heat and power, less to manage = lower cost.
- FabricPath (when mature) has the potential to mean more VLANs throughout your datacenter without STP risk and having to exercise control. Although I’ve written elsewhere that you might want to still try to keep some VLAN discipline going. How to do so, I’m still grappling with — as I suspect most of us are.
This article is getting a bit long on me. I’m going to leave it to you readers to add to this (CHALLENGE!) — please add comments with other ways datacenter changes lead to cost savings.
Well, there is all that Nexus stuff. Are you sharp on what the Nexus line can do for you? How about up on FCoE and FIP? Do you know enough SAN basics to at least talk to your SAN cousins?
We’ll skip the shameless plug for the FireFly classes, which I think are tops. They really invest in trying to ensure the best possible instruction.
And there are classes on 1000v, VMWare, Cisco UCS, etc. too. A chance to express your server side personality! <grin>
For the more exotic, there’s the somewhat less-Cisco-flavored stuff: VXLAN, EVB, OpenFlow, OpenStack.
I should add Data Center Interconnect to the list. Although that’s becoming more of a known art, the “optimal routing” aspect (and/or using LISP) is still pretty new stuff.
So for those who like me enjoy the technology and making this stuff work, happy reading!