Data Center and Network Architectural Standards

Author
Peter Welcher
Architect, Operations Technical Advisor

I’ve been learning all I can about VMWare, vSphere, and Cisco 1000v, along with soaking up Cisco’s ideas of data center virtualization. And reading up on cloud computing. Meanwhile, I’m going through a switch modernization project at a site that will remain nameless. Really, it could be anywhere, I’ve seen everything I’m about to mention before, usually several times in several different places. 

I’ll comment on the quick wins I can see for Cisco’s virtualization story in another blog. It all comes together, at least in my own mind. The theme for me is “hand-crafting” versus “automation”.

Brief explanation: we have a lot of one-off servers, application architectures, network application architectures. Building them and supporting them is labor-intense. Virtualization and cloud computing offer promise. However, a cloud computing vendor is probably going to have standardized offerings, e.g. a virtual mail, or desktop, or database server. Or non-virtualized, their choice, your choice. Certain RAM and CPU and OS combinations. (With fries and soft drink?) Their costs go up if they have to custom engineer (build) each VM to your specs. They’ll want to support a few standard sorts of VM’s, to keep things simpler, keep costs down. 

How many organizations think this way? That one-offs create cost?

Example: if you save money by not having a patch panel in a server rack, or not having them all the same (for some value of “same”), you’ve created a one-off. The cost of learning then dealing with “oh that’s cabled to the next rack over”, times several staff or consultants, over several years, is probably far higher than the patch panel hardware and supporting back-end cabling. 

Example: virtual machine servers. I’m now seeing IBM, HP, Dell and other camps. Will we soon have some teams building with MS virtual server tools, and others with VMWare? And the Linux folks off doing open-source VM’s? 

Example: server load balancers (SLB’s). What I do see sometimes is particularly endemic to project-oriented sites, but I’ve also seen it with different vintages or vendors of hardware and software in large data centers. They have some DNS round robin for load balancing, some F5 or Alteon or Cisco CSS here, some Linux software-based load balancing there, something else in another  rack… No two the same. Supporting this is not fun. Knowing how failover is supposed to work, let alone testing it as upgrades come out — doesn’t happen.

Cisco would love to sell you an ACE for that. What I like about their marketing pitch is the idea that each new project doesn’t actually have to buy SLB hardware, which saves a full procurement cycle plus cost. With contexts, it’s almost like having your own SLB. And best (to me) is that you only have to understand and test failover behavior once. Which means it is do-able, and can be repeated before doing software upgrades, if you feel the need. For the Anything But Cisco (ABC) crowd, fine, substitute your favorite vendor here. I’ve been calling this “Enterprise Load Balancer” or “The Honking Big SLB” design. It gets you out of the onesy-twosy business (aka “hand-crafted”). If you haven’t noticed, I like that idea! 

I’ve been noticing for years that large enterprises and government sometimes have software architecture committees. Some seem to be of the “study it for 5 years” variety. I believe in do something now or soon, then make it better, or you’ll never really get started, let alone finish. My favorite is the Very Large site with the 10 year plan to do DR properly. Isn’t DR = Disaster Recovery something you need yesterday? Or ASAP? Enough with telling social commentary!

I have yet to see a “network architecture standard” at any site. Let’s look at what I see that might fit under that umbrella. Hmm, Enterprise SLB! (Surprise).

One of the fun things with migrating servers to new switches is figuring out what the ports are and what they’re doing. Gee, there are 4 (or 6, or 8) NIC’s.

 Which ones are VMWare management (console), which data, which VMotion, which backup? If they vary from box to box, you’re in hand-crafted land again!

Which ones are doing auto-negotiate? Which legacy boxes are hard-coded 100/full or 10/half? (And should have been set to auto-negotiate or upgraded years ago, but “don’t mess with it if it works” ?)

How about 802.1Q VLAN tagging? Which ports require it? Does the server support native VLAN or are all VLANs tagged?

How about EtherChannel? Which two ports are data, and are they set up as “teaming” in the active-passive sense, or are they “teamed” or “bonded” as in EtherChannel / LACP? 

Note that all of these are fun from the point of view of setting up a switch 0r supporting the server. If they’re not documented well, it’s kind of hard to tell what the server is doing from the switch side. Passive interfaces don’t transmit, so all you see is link state is up but no MAC or IP associated with the port. Yup, that’s somebody’s passive side. Whose?

If you’ve ever been through this, I think you’ll buy my pitch for standards. Also, if you’ve been over this territory, you’ll appreciate the difficulties you can sometimes have communicating with server admins. The terminology varies with hardware vendor and/or Operating System. 

The term “teaming” has something like 5 or 6 meanings (variations) in a nicely written HP documentation page. 

http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c01415139/c01415139.pdf

Here’s a Dell document (not as lucid): http://support.dell.com/support/edocs/network/r35278/broadcom nic teaming_1.1_final.doc

“Bonding” seems to be in use for EtherChannel / LACP in the Linux community. Linux teaming (bonding): http://www.cyberciti.biz/tips/linux-bond-or-team-multiple-network-interfaces-nic-into-single-interface.html

Good Cisco-VMWare reference document: http://www.cisco.com/application/pdf/en/us/guest/netsol/ns304/c649/ccmigration_09186a00807a15d0.pdf

This latter adds another terminological issue. Active-active can mean two links active, or it can mean they’re active but only one is in use for a given MAC or port (or  IP). Just as Cisco ASA or ACE hardware can be active-active per context — where each context is only active on one side. I’d like to suggest the term “pseudo-active-active” for such a situation. 

I recommend the above to help establish clear communciation with server admins, especially where there is an English to native language barrier as well. Written docs can help in such situations.

Caution: you may experience extreme attenuation of the communications channel with those few server admins who think NIC’s and networking are plumbing, beneath their notice (or trivial?). 

I hope you’re still reading, maybe wondering “what did that rant have to do with standards?”

Well, suppose Server Model A uses Onbd1 and 2 to dual-home a server in active-passive fashion, and Server Model B is for VMWare servers, using Onbd1 and PCI1 for console/mgmt and data, and Onbd2 and PCI2 for VMotion, both tagged, both active-passive. And all ports auto-negotiated.

If those are your only two configurations of servers, then you only have to communicate “Model A” or “Model B”. You no longer need as much dialog to establish which NIC’s are in use, what for, which are tagged, what “teaming” is in use, etc. You still need to know what VLAN a non-tagged port should be in — tying the VLAN to 3rd octet helps with VLAN-unaware server admins. (“What is your IP address” is a lot simpler to get an answer to.)

If you like EtherChannel / LACP to servers, add Model C, where the dual-homing does channeling, say to the two chassis in a Cisco VSS pair of 6500’s. 

This gets rid of the 2^N problem, where N is the number of possible settings.  It allows all the admins to leverage prior server experience and prior practices and testing. Internal Best Practices!

Think about support. You need a spreadsheet tracking server and Network Architecture Server Interface Model (i.e. A, B, or C above). Both teams now know how to set things up. Troubleshooting means staff rapidly get to where they’ve seen the problems for each Model, rather than each server having different sorts of problems. If you have one way to do SLB, then that part’s simpler too. Ditto SSL acceleration and other components. Ok, firewalling, subject for another blog article. 

By the way, if you can roll all the NIC’s up into 1 or 2 10 Gbps NIC’s, that cuts the cabling complexity (tangle). Another potential cost savings.

The tie-in to Cloud Computing, virtualization, or Cisco 1000v? Think port profiles, bundling up port settings into a few standardized variations. Think automation of server to network connections! For Cloud,think “vendor standard VM models or offerings”. What would you be buying from the cloud computing vendor? If your needs are all over the place (not standardized), how easy is it going to be to use a generic cloud server? 

Leave a Reply