Avoid Server Downtime By Managing Your Load Balancers: Part 2

Last week, we covered a few basics on load balancing to lay the foundation for the details we’ll go over today. We covered the steps your web page request takes when a load balancer is used in two different load balancing models, and points you need to consider when deciding how to setup your server farms.

Today we get into the information a load balancer uses to decide how to handle your connection within the tiered model. This will help you understand the best setup for your environment, and how it scales most (or least) efficiently as your business grows.

We’ll cover health monitors, load-balancing methods, and persistence.

Health monitors are used to not only see if a server is up or down but also to measure or gauge its load capabilities. This is done through settings you choose when creating the server farm, ranging from simple ping intervals to calling a web page and examining the content to be sure the server is delivering valid content. There are roughly 30 types of health checks to choose from. We’ll only cover a couple to illustrate the value difference they provide:

Choosing ICMP as a health monitor is probably the highest risk of them all because it simply tests to see if the server is alive at the OS level, but does nothing to ensure that it’s serving your pages. A lot of devices can respond to a ping even if it’s not working right.
A port-specific session test is better. There are a few choices like HTTP/HTTPS, TCP half open, and many others that are protocol-specific. You can even check for expected content to be sure a real connection is established.

There are effectively two types of Load-balancing methods used to rank server availability: static and dynamic. Static uses parameters you define manually and dynamic uses calculations against metrics learned while handling connections. They can be as simple as a cyclic rotational choice or other methods as complex as measuring a subset of performance metrics based on response timing, current connections, time span for each connection, ratios between those times, and other variables. Some methods work best where all server capabilities are similar and others when they can account for differences.

Round Robin is the default load-balancing method on F5 LTMs. “Round Robin” means the next available server is simply the next one in the rotation regardless of its current connection count or load capabilities. The caveat to this one is it doesn’t consider the capabilities of a server. For comparison, imagine a line of kids waiting for ice cream. After they get some they go to the back of the line again so they can get more, and they should all be capable of eating the same amount of ice cream.
Least connections means the next incoming request will be forwarded to the server showing the least active connections. This one also barely considers the processing capabilities of a server.
Observed and Least connections are very close in the way they determine where to forward new connections. While Least connections is based mostly on connection counts, Observed uses connection counts but also relies on the length of each connection as an additional parameter.
Predictive uses a trend analysis to decide where the next connection gets forwarded to and is the most reliable predictor of the next available server. Predictive uses metrics similar to Observed and Least connections and can be most accurate.

Persistence is used to determine how long your session is stuck to one server and the criteria used to track which server you should be stuck to. Without persistence, any connections after your initial request could get forwarded to any other server that isn’t aware of your session state. This is extremely important for applications with a series of forms where you drill into a subset of choices as you progress from one page to another.

A common persistence type that tends to get over used is Source IP, which will forward any matching traffic to a server. Any new connections sourced from the same IP will be stuck to the very same server even though there can be many other servers available within the farm. Consider this a static persistence type.
Cookies are another persistence type that overcomes the “stickiness” of source IP persistence. You can have many connections coming from behind a remote NAT and they can all get balanced to a different available server even if they originate from the same source IP. The client cookie is unique for each client, dynamically created, and there’s no requirement to be bound to the same server.

Any of these can be changed after the server farm is active.

One level talking to another

Your multi-tiered server farm will make calls between tiers to another load balancing level. After your client request hits the GW servers, the GW servers will hit the APP servers. The APP servers will hit the DB servers.

So let’s examine the popcorn trail.

We’ll focus on persistence and the two extremes of persistence types. Regardless of server health checks and load balancing methods, this is the one choice that can make or break your performance scalability.

Let look at Source IP persistence first (Static).

You already know that many connections from the same source IP will get stuck to the same server. So your remote clients get stuck to the same GW server and your GW server makes a call to the APP server farm. Your APP server farm is setup for source IP persistence as well so the single GW server is now stuck to one APP server.

There is effectively no load balancing taking place in this scenario. You’ve got just one GW and one APP server handling a multitude of connections from a single remote source IP.

Now let’s look at using cookies (dynamic) to identify source clients and track their connection to determine which server they’re stuck to.

When they make the initial connection, they get a cookie from the load balancer and the cookie is used just like the source IP to track the connection and ensure it’s maintained on the correct server. The next incoming client gets a unique cookie of its own and gets load-balanced to the next available server, even if it comes from the same source IP.

So now you have 2 clients connected to 2 different GW servers. The GW servers connect to the APP server. Assuming the load balancer for the APP servers is set up for a dynamic persistence type as well, new connections from the GW servers will get appropriately dispersed rather than follow the preceding connection.

Do I use static or dynamic persistence?

Source IP persistence was created because it fits a given use case. I’m not saying you shouldn’t use it, but know when to use it. This really all depends on your business and where the clients are coming from.

You know that if the connections are coming from inside your business environment, most, if not every source IP will be unique. You don’t have to be concerned about 500+ client connections getting stuck to a single server because they appear to come from one IP. Source IP persistence may be the simplest choice.

If your business sells service subscriptions to other companies, you can bet their connections come from a unique public IP. In a case like this you want to avoid using source IP persistence. An exception would be if you use a VPN to provide an internal connection and there is no overlap in internal private IP address space forcing you to use NAT at the VPN ingress/egress point.

Avoid the sprawl

This is the kind of thing that’s important to understand when implementing load balancers for the first time in your datacenter. Most often, when expected loads are minimal it’s not all that critical, but over time you’ll likely deploy dozens if not hundreds of “Virtual Servers,” and when your business grows to a point where the setup is not going to permit it to scale properly, it’ll be a lot of work and potential downtime to correct it.

One level talking to another

Do I use static or dynamic persistence?

Avoid the sprawl

Leave a Reply

Related Topics