The hosting world’s bread & butter solution for providing high availability and redundancy is load balancing. There are many different use cases for a Load Balancer (LB). It is important to know how to effectively manage your LB configuration so that it performs optimally in your environment. The proceeding article will review some of the common practices that, when adhered to, provide a smooth and seamless high availability website/application through the use of load balancing.
This article will focus on load balancing concepts and uses typical web services (HTTP, HTTPS). These services are independent of the LB device itself so they can run on any combination of operating system and server software, e.g., Linux, Windows, Apache, Nginx, IIS, etc. Load balancing is not limited to just web services. Any type of traffic with a client/server relationship can take advantage of load balancing.
What is Load Balancing?
Load balancing is the practice of using a network device, called a Load Balancer (LB), to distribute traffic between a back-end cluster of servers, called nodes. These nodes are virtually identical, each running the same software, services, and configurations. Broken nodes can be easily replaced by additional nodes which are also added to handle traffic growth over time. This prevents a single node from becoming overwhelmed as the LB uses specific load balancing algorithmic methods to determine which node handles subsequent requests. Load balancing is virtually invisible to the end-user, operating behind the scenes allowing a farm of servers to function as a single service or application. Load balancing comprises the backbone of most high availability solutions due to its flexibility, redundancy, and extendability.
OSI Layer Optimization
All load balancing occurs on one of two layers of The OSI Model. These layers allow for traffic balancing configurations based on different information that is contained in the network packet for that specific layer. The two layers involved in this process are the L4 Transport and L7 Application layer.
- L4 – Transport Layer: The fourth layer allows balancing rules based on transport protocols. Balancing traffic based on details like IP Address or TCP Port are provided within the L4 layer.
- L7 – Application Layer: The Application Layer provides many additional details that can be inspected from the packet for balancing rules. The L7 layer is where rules can be constructed based on information from HTTP Headers, SSL Session ID, HTML Form Data, Cookies, etc.
Load Balancing Algorithmic Methods
There are several common balancing algorithmic methods which can be used in a load-balanced configuration. Selecting the right algorithm for your infrastructure is critical to load balancing optimization. There is no general, one-size-fits-all, method for every situation. Choosing the correct approach will depend heavily upon the services, traffic, and software used in the load-balanced cluster. Below are some of the common methods used and their strengths/weaknesses from an optimization perspective.
Round-Robin- A simplistic load balancing approach. Traffic is sent to each node in series, one after the other, jumping back to the beginning of the list once the end is reached. (e.g., Node 1 → Node 2 → Node 3 → Repeat)
Least Connections – A smart balancing method which uses connection tracking to determine which of the nodes has the least number of active network connections. This tracking is managed by the load balancer device itself distributing every new connection to the back-end node with the least amount of active connections.
Fastest Response Time – A smart balancing method which tracks each nodes network response time from health checks and routes new connections to the server with the quickest response time, regardless of any other factors.
Random Node – A niche method which is only used in very specific scenarios and is almost never appropriate from an optimization standpoint.
Weighted Load Balancing
Another useful load balancing feature to consider is weighted nodes along with the balancing method. Weighted load balancing assigns additional connections to specific nodes over others dependent on each nodes weight value in the configuration. This feature is generally used when back-end nodes are not identical hardware or certain nodes receive special traffic beyond regular load-balanced traffic. Assigning a lower weight value in the form of a ratio to the weaker nodes allows them to participate in the workload at their individual capacity limits without getting overwhelmed. For example, a weight ratio of 2:2:1, would assign two connections to node 1 & node 2 for every 1 connection assigned
to Node 3.
Choosing an Optimal Balancing Algorithm
When considering load balancing optimization, stick with the smarter balancing methods like Least Connections or in some cases Fastest Response Time. Least Connections, in particular, is good at distributing workload between all available hardware without focusing too much on a single node. However, this balancing method will require a beefier load balancing device as the amount of nodes and traffic ramp up. Fastest Response Time can be a great alternative when the hardware available for the load balancer is more limited. Each has its pros and cons as listed above, but both work well as optimized balancing methods for busy server farms.
Traffic Pinning & Session Persistence
Some configurations take advantage of pinning certain traffic to specific back-end nodes depending on the traffic. Session Persistence is a widely used form of Traffic Pinning as it results in routing requests to specific servers outside of the load balancing method used. Another common practice is assigning administration or upload traffic to a specific node that propagates the changes to the rest of the server farm. It is important to consider these connection types when designing or optimizing your server cluster.
Server Farm/Cluster Scope
Some important considerations should be accounted for when designing or upgrading your server cluster. A common mistake that gets overlooked is the size of the cluster. The amount of nodes available in the cluster should not only be enough to handle regular workloads. It’s also necessary to account for both surges in workload as well as failures in the cluster. Ideally, a cluster should be large enough to remain fully operational, even when a surge in workload occurs and a critical event throws a node offline.
Redundancy: Spare vs. Fail-over Node
Quite simply, redundancy is not effective without an extra node to handle the workload in the event that a critical issue occurs with another node. There are essentially two methods of handling redundancy in a load-balanced cluster.
Spare Node – An extra node in the cluster that is not needed for the cluster to function. It resided active in the cluster, handling workload just like any other node. However, it’s there in case another node falls. The cluster can carry on without interruption, while the broken node is addressed as needed.
Failover Node – An extra node, that has been configured and tested to work normally within the cluster. Once tested, the extra node is assigned as a fail-over node and taken out of the active configuration. The fail-over node, then sits on standby, waiting for a critical event to occur within the cluster. The fail-over node is then automatically activated in the cluster and start handling traffic.
Either method provides redundancy and keeps the application/site alive during critical times. A combination of both methods can be used as well to have the benefit of both options. It boils down to preference and cost. However, adhering to at least one of these methods should keep you running at an optimal state even during hard times.
Front-loaded Permanent Redirects
One way to improve response times of a load-balanced setup is to mitigate permanent redirects. This is done by moving them onto the load balancer device directly, instead of relying on the back-end nodes to issue a redirect to the client. This can reduce connection counts on both the load balancer and the back-end nodes themselves by eliminating one segment of the redirect process. The following illustrates this further when using a common practice of forcing HTTPS via redirects.
When a back-end node is configured to force HTTPS connections, the HTTP request comes into the load balancer, then is processed by the balancing algorithm and finally sent onto the necessary back-end node as normal. The back-end node then issues the redirect, which instructs the client to reconnect over the HTTPS protocol. The new HTTPS request is also balanced by the algorithm as needed and sent to a back-end node. The net processing result of this type of redirect is:
- x2 front-end requests (HTTP & HTTPS)
- x2 back-end requests
- x2 load balancing algorithmic checks
Now compare this to a front-loaded HTTPS redirect.
In the front-load configuration, the HTTP request hits the load balancer but is not processed by the balancing algorithm. Instead, it immediately issues the redirect back to the client. Then the client reconnects over HTTPS protocol, which then invokes the algorithm as needed and is sent onto the back-end node as required. The net processing result of this configuration is:
- x2 front-end requests (HTTP & HTTPS)
- x1 back-end request (HTTPS only)
- x1 load balancing algorithmic checks
This example only illustrates a single request. However, load balancers typically handle hundreds or even thousands of requests concurrently. It is easier to see the overall benefit of a change like this when scaling up the example:
There are a handful of ways that load balancers are configured to handle SSL encrypted connections like HTTPS. Encrypted connections are more cumbersome on the load balancer device than non-encrypted connects. The process of validating the certificate chain and then decryption the content adds additional workload to every request handled by the load bala
ncer. This can be mitigated depending on the configuration of the back-end cluster servers and the applications/website’s encryption requirements. The following are the common configuration scenarios and their pros/cons.
SSL Passthrough – The LB is configured to pass any encrypted connections through to the necessary back-end nodes. Decryption is handled by the back-end nodes only and not the LB device.
Decryption Only – The LB device itself will perform the necessary decryption. Traffic is then balanced to the back-end nodes after decryption. Due to security concerns, this configuration is only recommended when the back-end nodes and load balancer device have an isolated LAN to communicate through.
Decryption + Re-encryption– Decryption is handled by both the LB device and then re-encrypted and sent to the back-end node as needed. The back-end node also performs decryption providing full end-to-end encryption. This method is for high-security setups that also require L7 load balancing rules.
The load balancing optimization stance is the less workload the LB performs, the faster it responds to requests. This makes SSL Passthrough the ideal choice as it requires the least amount of work from the LB device. However, this is not always the practical solution for some applications/sites, particularly those that require L7 layer load balancing or other packet inspection needs. Decryption Only requires less workload when compared to Decryption + Re-encryption, but should only be leveraged when running a dedicated LB on a private LAN with the back-end nodes preventing outside entities from abusing the non-encrypted back-end traffic.