Load Balancing

Load balancing distributes incoming traffic across multiple servers to ensure high availability, reliability, and optimal resource utilization.

Load Balancer Types

Layer 4 (Transport Layer)

Layer 4 Load Balancing Characteristics:

Protocol Awareness: Operates at transport layer (TCP/UDP) without inspecting application data
Performance: Faster processing with minimal latency (typically < 1ms overhead)
Decision Making: Routes based on IP address and port number
Use Cases: Ideal for TCP-based applications, databases, and simple web services
Limitations: Cannot make routing decisions based on HTTP headers or content

Layer 7 (Application Layer)

Layer 7 Load Balancing Features:

Content Awareness: Inspects HTTP headers, URLs, cookies, and application data
Smart Routing: Directs traffic based on content type, request parameters, or API endpoints
Advanced Features: Supports SSL termination, caching, and request manipulation
Use Cases: Microservices architectures, content-based routing, and web application firewalls
Trade-offs: Higher processing overhead but more intelligent traffic distribution
Common Implementations: Application Delivery Controllers (ADCs) and cloud load balancers

Load Balancing Algorithms

1. Round Robin

Round Robin Algorithm:

Simple Distribution: Distributes requests sequentially across servers in a circular pattern
Stateless Operation: No memory of previous connections or server load
Equal Distribution: Assumes all servers have equal capacity and performance
Best For: Homogeneous server environments with similar hardware and request processing times
Limitations: Doesn't account for server load or varying request complexities

2. Least Connections

Least Connections Algorithm:

Dynamic Allocation: Routes new requests to the server with the fewest active connections
Load Awareness: Real-time monitoring of connection counts across all servers
Adaptive Distribution: Automatically adjusts to varying server loads and request durations
Best For: Environments with long-lived connections or variable request processing times
Implementation Considerations: Requires connection tracking and state management

3. IP Hash

IP Hash Algorithm:

Consistent Mapping: Uses client IP address to determine server assignment through hashing
Session Persistence: Ensures clients consistently connect to the same server
Deterministic Routing: Same IP always routes to the same server (unless server pool changes)
Best For: Stateful applications without session replication or caching scenarios
Potential Issues: Can cause uneven distribution if client IPs cluster naturally

4. Weighted Round Robin

Weighted Round Robin Algorithm:

Capacity-Based Distribution: Assigns more requests to servers with higher capacity (weight)
Proportional Allocation: Server weight determines the percentage of total requests
Heterogeneous Environments: Ideal for servers with different hardware specifications
Configuration: Weights can be adjusted based on server performance metrics
Calculation: If weights sum to N, a server with weight W receives W/N of total requests

Health Checks and Failover

Health Check Mechanisms

Health Check Implementation:

Active Monitoring: Load balancer periodically sends health check requests to all backend servers
Check Types: HTTP status checks, TCP connection tests, or custom application-specific probes
Frequency Configuration: Balance between detection speed and server overhead (typically 10-60 seconds)
Failure Detection: Multiple consecutive failures required before marking server as unhealthy
Recovery Detection: Multiple successful checks needed to restore server to healthy status
Grace Period: Configurable delay between server startup and health check initiation

Failover Process

Failover Strategies:

Automatic Failover: Immediate removal of unhealthy servers from rotation without human intervention
Graceful Degradation: System continues operating with reduced capacity rather than complete failure
Circuit Breaker Pattern: Temporarily stops sending requests to failing servers to allow recovery
Health Thresholds: Configure sensitivity levels to prevent flapping between healthy/unhealthy states
Fallback Mechanisms: Error pages, cached responses, or redirect to alternative endpoints
Recovery Process: Gradual reintroduction of recovered servers with limited traffic (canary releases)

Session Persistence

Session Affinity Strategies

Session Affinity Approaches:

Cookie-based Persistence:

Load Balancer Cookies: Load balancer inserts a cookie identifying the assigned server
Application Cookies: Application sets session cookie that load balancer can read and route based on
Cookie Duration: Configurable expiration times (session-based vs. persistent)
Advantages: Works with NAT and proxy clients, transparent to end users
Limitations: Requires cookie support, can interfere with application cookie policies

Source IP Persistence:

IP-based Mapping: Uses client IP address to determine server assignment
Hashing Algorithm: Consistent hash of client IP ensures stable routing
Use Cases: Useful for applications that don't support cookies or have strict security requirements
Challenges: Issues with NAT environments where multiple clients share same IP
Fallback Behavior: Often combined with other methods for reliability

Session Replication vs. Persistence

Session Management Strategies:

Session Persistence (Sticky Sessions):

Server-local Storage: Session data stored on individual server's memory or disk
Affinity Requirement: Client must always return to the same server for session continuity
Single Point of Failure: Server failure results in session loss for affected users
Simplicity: Easy to implement with minimal infrastructure requirements
Load Distribution Issues: Can create uneven load distribution over time

Session Replication:

Centralized Storage: Session data stored in shared database, cache, or distributed store
Server Agnosticism: Any server can handle any request regardless of previous interactions
High Availability: Session data survives individual server failures
Implementation Complexity: Requires additional infrastructure and synchronization mechanisms
Performance Trade-offs: Additional network latency for session access but better load distribution

Global Server Load Balancing (GSLB)

Geographic Load Balancing

Geographic Load Balancing Features:

DNS-based Routing: Uses DNS responses to direct users to optimal data centers
Proximity Optimization: Routes users to geographically nearest infrastructure
Latency Reduction: Minimizes network latency by serving content from closer locations
Disaster Recovery: Automatic failover between different geographic regions
Compliance Considerations: Helps meet data sovereignty and regulatory requirements
Implementation Methods: DNS-based, anycast IP addresses, or HTTP redirects

GSLB Decision Flow

GSLB Decision Logic:

Health Monitoring: Continuous monitoring of all geographic data centers
Priority-based Routing: Multiple criteria weighted for optimal routing decisions
Load Awareness: Considers current load and capacity of each data center
Failover Hierarchy: Configurable fallback chain for disaster scenarios
Performance Metrics: Real-time latency and availability measurements influence routing
Business Rules: Custom routing based on business requirements or user segments

Load Balancer Deployment Patterns

Active-Passive

Active-Passive Deployment:

Primary-Secondary Setup: One load balancer handles all traffic while the other remains on standby
Heartbeat Monitoring: Continuous health checks between active and passive units
Automatic Failover: Passive unit takes over when active unit fails or becomes unresponsive
Failover Time: Typically 2-30 seconds depending on configuration and detection mechanisms
Configuration Synchronization: Settings and rules must be kept consistent between units
Use Cases: Smaller deployments, cost-constrained environments, or simple redundancy requirements

Active-Active

Active-Active Deployment:

Dual Active Configuration: Both load balancers simultaneously handle traffic
Traffic Distribution: DNS round robin or geographic routing between load balancers
Session Synchronization: Cross-replication of session data and state information
Increased Capacity: Provides higher throughput and better resource utilization
Complexity: Requires careful configuration to prevent conflicts and ensure consistency
Geographic Distribution: Ideal for multi-region deployments with latency optimization
Load Balancer Health Monitoring: Mutual health checks between active units

Performance Metrics

Key Load Balancer Metrics

Metric	Description	Target
Request Rate	Requests per second	Scale with capacity
Response Time	Time to process request	< 100ms
Error Rate	Failed requests percentage	< 0.1%
Connection Count	Active connections	Monitor thresholds
Health Check Pass Rate	Successful health checks	> 99.9%

Essential Performance Indicators:

Throughput: Measure of total requests processed per second, indicating overall capacity
Latency Distribution: Track not just average but p50, p95, and p99 response times
Connection Efficiency: Ratio of successful connections to total connection attempts
Resource Utilization: CPU, memory, and network usage on load balancer instances
Backend Performance: Individual backend server response times and health status
SSL Termination Overhead: Additional latency introduced by SSL/TLS processing

Load Balancer Decision Flow

Load Balancing Decision Process:

Algorithm Selection: Chooses appropriate distribution method based on configuration
Health Verification: Confirms server availability before forwarding requests
Failover Logic: Iterates through server list until healthy server is found
Connection Management: Maintains connection pools and handles session persistence
Monitoring and Logging: Records decision outcomes for performance analysis
Adaptive Behavior: Some advanced load balancers adjust algorithms based on performance metrics

Load Balancing Pitfalls

Sticky Sessions: Can create uneven load distribution
Health Check Aggressiveness: Too frequent checks overload servers
DNS TTL: Low TTL increases DNS query volume
Configuration Drift: Inconsistent settings across load balancers

Choosing Load Balancing Algorithm

Round Robin: Good for identical servers with similar capabilities Least Connections: Best for variable request processing times Weighted: When servers have different capacities IP Hash: Required for stateful applications without session replication

Load Balancing

On this page