The topic of Data Center Interconnect at L2, and failover between data centers, seems to be a hot topic! I’d like to briefly note some of the great interaction that’s been occurring with some of my prior blogs, and also note an interesting article by Ivan Pepelnjak, whose technical skills I highly respect.
Concerning Data Center Interconnect (DCI), I really like the new Cisco OTV technology. (OK, you probably guessed that from all the prior articles about it.) Whatever technique you use for DCI, interconnecting data centers at L2, you have the problem of managing and optimizing inbound and outbound traffic to use the shortest path to or from the active virtual machine (VM) or cluster member(s).
Comments and discussion about that can be find in various of my blogs. I appreciate the feedback and the chance to learn and discuss! See in particular the article on Understanding Layer 2 over Layer 3 (Part 2). I’ve noted below recent prior blogs and how many comments, for those inclined to go spelunking around this topic.
Other Takes on DCI and Optimal Routing
I was interested and amused to see a blog by Ivan Pepelnjak about Long Distance vMotion, at http://searchnetworking.techtarget.com/feature/Long-distance-vMotion-traffic-trombone-so-why-go-there?asrc=EM_NLN_13283529&track=NL-79&ad=813626.
Ivan makes a good case for multiple servers and Server Load Balancers in front of them, especially as a way of avoiding Data Center bridging. He also mentions “traffic tromboning”, which is the sub-optimal traffic flows I’ve blogged about.
I’m not as allergic to data center bridging as he seems to be, despite having seen my share of data center meltdowns due to spanning tree loops. I hear good things about traffic storm control, and with L2 over L3 it seems like the L3 encapsulation will fail or reach capacity before the problem spreads as widely — now that would be an interesting lab experiment. On the other hand, I like the idea of not using a risky technology unless you’ve got a darn good reason. Ivan raises one: some sort of server or application running on a single platform. And also of keeping management complexity down.
Overdoing It?
Frankly, with the DCI techniques, and especially with OTV, I worry about the “beer principle”. (One beer might be a good thing. Too many beers leads to a headache.) Applied to OTV, you start doing it on a small scale, then it grows, and then one day you discover you’ve pushed it too far, it is unstable or having some problem … and you have a headache.
I particularly worry about this in medical settings, and perhaps federal government datacenters. Hospitals tend to use real estate for medical needs, in good part because frankly that produces revenue. The management usually views IT as a necessary evil, a cost center. Funding for new hospitals and clinics takes priority over a new datacenter. Consequently, you end up with little datacenters scattered all over. A variant happens with the federal government. The current data center gets outgrown, then there isn’t funding to build a new one that can hold all the servers (and after all, the old one is still working, albeit perhaps maxed on power and cooling or space), so another smallish one gets added one. Then maybe some space shows up in some other agency’s datacenter (after the recent consolidation push), so that gets tacked on. When you view the overhead costs of operating small datacenters compared to one big one, this may be vastly suboptimal — but there’s apparently no good political and financial way out of it.
Over time, this scattered data center approach leads to design problems. One generally brings WAN links into the datacenter, but because the “main and backup” datacenters changed over time, the WAN links come in all over the place. Ditto Internet connections. With L2 between them, one might use stateful firewall pairs split across data centers, also L2 server clusters.
Split Firewall Pairs or Server Clusters
I do have a different concern about that situation, which is robustness of the cluster or stateful firewall pair. Links between datacenters might be L2 (or L2 over L3), but are generally not as reliable and error-free as links within a single datacenter. What happens to your firewall pair, or your Server Load Balancer pair, or your cluster, if the link between sites goes flaky but not down? I’ve heard horror stories about CheckPoint firewalls and Microsoft Exchange clusters where both sides thought they were primary. The issue seems to be that packet loss is not expect by the vendor, and when the link comes back up … SURPRISE! In the case of CheckPoint, they both update each other’s policy, which can lead to corrupted policy (i.e. missing ACL rules). In the case of Exchange, I’ve heard from one person who had experienced it that the servers started to re-synch their databases of email, but for 6 hours they then did not respond to email clients. I freely admit, I do not know nor have tested current versions of either.
If you have experience with situations like this (where the app continued to work correctly, or where there were problems), please add a comment — more data is needed on this phenomenon, and collectively we have a lot more experience than I by myself do!
For some prior musings about this issue, see
- https://netcraftsmen.com/user-group/c-mug-archive/343-solving-real-data-center-design-problems.html
- https://netcraftsmen.com/archived-documents/petes-archives/doc_download/187-solving-real-data-center-design-problems-.html
Prior DCI and OTV Articles
Here are just the recent ones, see also the archives.
- Cisco Overlay Transport Virtualization (OTV) — 2 comments
- First Hop Routing Protocol (FHRP) Info — 4 comments
- OTV Optimal Routing — 2 comments
- Exploiting Layer 2 over Layer 3 — 4 comments
- Understanding Layer 2 over Layer 3 (Part 2) — 2 comments
- Understanding Layer 2 over Layer 3 (Part 1)
- Working with EoMPLS Part 2 — 5 comments
- Working with EoMPLS — 5 comments
Hi Pete,
Thanks for mentioning my article (in case your readers are interested in more details, they can find them in my blog @ http://blog.ioshints.info/search/label/Data Center). There are two reasons I’m so opposed to L2 DCI:
* It cannot be made reliable. Bridging was not designed for long distances or unreliable/low(er)-speed links. As you pointed out, the WAN link becomes the weakest link in the system and can (with ‘proper’ design) bring down the whole data center;
* There are other, way better, architectural options, including proper application architecture and load balancers. L2 DCI is just a kludge that networking vendors are promoting to make us spend more money.
Last but definitely not least, I do have a customer that experienced a split-brain cluster. Net result: two active database servers writing to the disk and totally corrupting the database. They spent several hours restoring the data from backups and lost a significant number of transactions.
Thanks for the comments and the info about the DB corruption, confirming my fears.
I do have some sympathy for folks wanting to do VMware Fault Tolerance. I take as a given that there are too many demands and too little time for server admins. The world may be stuck with "do something that’s a bit sub-optimal" because re-engineering the app to do Server Load Balancing may be too difficult, next to impossible.
There are all too many 3rd party apps where you don’t have source code and don’t have the ability to modify the program, which may contain 20 year old legacy code. (In the medical field, maybe 30 year old code with malpractice programming such as embedded IP addresses and subnet masks.) So I factor that into my thinking: there are a lot of sloppily written programs done by mediocre programmers that businesses depend on. Getting a program written with "I’m the only program writing to *my* DB" as an assumption to work with a Server Load Balancer (SLB)could be … challenging?
I think we agree to the extent that if one can do SLB’s one probably should. Unfortunately, managers will make the decision to support the cries of pain from server admins, at least until burned.
As a case in point, I have a customer who several years ago started doing VLANs throughout the data center. Recently, a malfunctioning 10 Gbps HP NIC has caused three STP loops in several months, bringing down the entire datacenter. From what I’ve heard, a blast of just about anything is clobbering the Sup2’s in the completely flat backup VLAN, causing the loop to start. One starts with "I made the VLAN bigger and it didn’t cause a problem", and unfortunately people then push it further than they should.
I see overdoing L2 DCI as being another cases where this [b][u]will[/u][/b] happen. I did drink the Cisco OTV Kool-Aid, so I’m game to try controlled amounts of L2 where "necessary" — but the human dynamic is "gee, here’s a solution that works, let’s do more of it" rather than appying a mix of solutions.
I guess I’m agreeing, at great length!
> When you view the overhead costs of operating small datacenters compared to one big one, this may be vastly suboptimal — but there’s apparently no good political and financial way out of it.
What about involving a "cloudy services" specialist provider, with big fat DCs and interconnects, and migrating your workloads there? And of course, the provider must be able to meet all the necessary regulatory and governance requirements before being considered.
The cloud would be nice. I think however you’ve perhaps assumed away two of the biggest problems with cloud: (1) regulatory / security due diligence and being able to demonstrate the cloud provider meets all requirements, and (2) big fat pipe.
I expect a growing business around (2), since naive moving of some servers to the cloud which carry on extensive conversations back to remaining servers on the campus will incur (a) bandwidth concerns, and (b) latency induced slowness, quite possibly. And while 1 Gbps TLS service isn’t totally unaffordable, I don’t see 10 to 40 Gbps MAN/WAN links being cheap anytime soon. As usual, WAN bandwidth is likely to be a bottleneck. And if you think about it, moving some servers to the cloud is basically extending some mix of your switching uplinks and switch backplanes across the MAN. Can the MAN provide equal capacity?
Since most server admins and app developers are unaware of their data flows and where they might have embedded latency sensitivity, I expect a booming business analyzing why the new cloud server is so much slower. Ultimately I think we need to identify "tight groups" of servers that talk to each other a lot, but not much outside the group, and move them to the cloud as a single entity. I think we’ll also see that services such as DNS, LDAP, Active Directory, etc. need to be replicated to the cloud since apps may make frequent calls to such services — and any latency will drastically affect app performance. (I’ve seen this with Lotus Notes and an overwhelmed LDAP server, for instance.)
The final barrier to cloud that I see folks overlooking is human — everyone needs to hand-walk their key apps into the cloud, verify performance, verify backup, etc. That already induces significant lag in being able to move stuff to VM’s, let alone to the cloud. Like years. (Perhaps add in some mix of: server admins rushing to put themselves out of business?)
Don’t get me wrong, I like the idea of cloud, I just worry that the practical aspects aren’t all resolved yet. The cloud is great for rapid scaling changes. It is not so great for one-off obscure legacy insecure and fragile applications. Guess what medical centers have a lot of?