Engineering Playbook
Performance Optimization

Load Balancing

Load balancing strategies (round robin, least connections, etc.).

Load Balancing

Load balancing distributes incoming traffic across multiple servers to ensure high availability, reliability, and optimal resource utilization.

Load Balancer Types

Layer 4 (Transport Layer)

Layer 4 Load Balancing Characteristics:

  • Protocol Awareness: Operates at transport layer (TCP/UDP) without inspecting application data
  • Performance: Faster processing with minimal latency (typically < 1ms overhead)
  • Decision Making: Routes based on IP address and port number
  • Use Cases: Ideal for TCP-based applications, databases, and simple web services
  • Limitations: Cannot make routing decisions based on HTTP headers or content

Layer 7 (Application Layer)

Layer 7 Load Balancing Features:

  • Content Awareness: Inspects HTTP headers, URLs, cookies, and application data
  • Smart Routing: Directs traffic based on content type, request parameters, or API endpoints
  • Advanced Features: Supports SSL termination, caching, and request manipulation
  • Use Cases: Microservices architectures, content-based routing, and web application firewalls
  • Trade-offs: Higher processing overhead but more intelligent traffic distribution
  • Common Implementations: Application Delivery Controllers (ADCs) and cloud load balancers

Load Balancing Algorithms

1. Round Robin

Round Robin Algorithm:

  • Simple Distribution: Distributes requests sequentially across servers in a circular pattern
  • Stateless Operation: No memory of previous connections or server load
  • Equal Distribution: Assumes all servers have equal capacity and performance
  • Best For: Homogeneous server environments with similar hardware and request processing times
  • Limitations: Doesn't account for server load or varying request complexities

2. Least Connections

Least Connections Algorithm:

  • Dynamic Allocation: Routes new requests to the server with the fewest active connections
  • Load Awareness: Real-time monitoring of connection counts across all servers
  • Adaptive Distribution: Automatically adjusts to varying server loads and request durations
  • Best For: Environments with long-lived connections or variable request processing times
  • Implementation Considerations: Requires connection tracking and state management

3. IP Hash

IP Hash Algorithm:

  • Consistent Mapping: Uses client IP address to determine server assignment through hashing
  • Session Persistence: Ensures clients consistently connect to the same server
  • Deterministic Routing: Same IP always routes to the same server (unless server pool changes)
  • Best For: Stateful applications without session replication or caching scenarios
  • Potential Issues: Can cause uneven distribution if client IPs cluster naturally

4. Weighted Round Robin

Weighted Round Robin Algorithm:

  • Capacity-Based Distribution: Assigns more requests to servers with higher capacity (weight)
  • Proportional Allocation: Server weight determines the percentage of total requests
  • Heterogeneous Environments: Ideal for servers with different hardware specifications
  • Configuration: Weights can be adjusted based on server performance metrics
  • Calculation: If weights sum to N, a server with weight W receives W/N of total requests

Health Checks and Failover

Health Check Mechanisms

Health Check Implementation:

  • Active Monitoring: Load balancer periodically sends health check requests to all backend servers
  • Check Types: HTTP status checks, TCP connection tests, or custom application-specific probes
  • Frequency Configuration: Balance between detection speed and server overhead (typically 10-60 seconds)
  • Failure Detection: Multiple consecutive failures required before marking server as unhealthy
  • Recovery Detection: Multiple successful checks needed to restore server to healthy status
  • Grace Period: Configurable delay between server startup and health check initiation

Failover Process

Failover Strategies:

  • Automatic Failover: Immediate removal of unhealthy servers from rotation without human intervention
  • Graceful Degradation: System continues operating with reduced capacity rather than complete failure
  • Circuit Breaker Pattern: Temporarily stops sending requests to failing servers to allow recovery
  • Health Thresholds: Configure sensitivity levels to prevent flapping between healthy/unhealthy states
  • Fallback Mechanisms: Error pages, cached responses, or redirect to alternative endpoints
  • Recovery Process: Gradual reintroduction of recovered servers with limited traffic (canary releases)

Session Persistence

Session Affinity Strategies

Session Affinity Approaches:

Cookie-based Persistence:

  • Load Balancer Cookies: Load balancer inserts a cookie identifying the assigned server
  • Application Cookies: Application sets session cookie that load balancer can read and route based on
  • Cookie Duration: Configurable expiration times (session-based vs. persistent)
  • Advantages: Works with NAT and proxy clients, transparent to end users
  • Limitations: Requires cookie support, can interfere with application cookie policies

Source IP Persistence:

  • IP-based Mapping: Uses client IP address to determine server assignment
  • Hashing Algorithm: Consistent hash of client IP ensures stable routing
  • Use Cases: Useful for applications that don't support cookies or have strict security requirements
  • Challenges: Issues with NAT environments where multiple clients share same IP
  • Fallback Behavior: Often combined with other methods for reliability

Session Replication vs. Persistence

Session Management Strategies:

Session Persistence (Sticky Sessions):

  • Server-local Storage: Session data stored on individual server's memory or disk
  • Affinity Requirement: Client must always return to the same server for session continuity
  • Single Point of Failure: Server failure results in session loss for affected users
  • Simplicity: Easy to implement with minimal infrastructure requirements
  • Load Distribution Issues: Can create uneven load distribution over time

Session Replication:

  • Centralized Storage: Session data stored in shared database, cache, or distributed store
  • Server Agnosticism: Any server can handle any request regardless of previous interactions
  • High Availability: Session data survives individual server failures
  • Implementation Complexity: Requires additional infrastructure and synchronization mechanisms
  • Performance Trade-offs: Additional network latency for session access but better load distribution

Global Server Load Balancing (GSLB)

Geographic Load Balancing

Geographic Load Balancing Features:

  • DNS-based Routing: Uses DNS responses to direct users to optimal data centers
  • Proximity Optimization: Routes users to geographically nearest infrastructure
  • Latency Reduction: Minimizes network latency by serving content from closer locations
  • Disaster Recovery: Automatic failover between different geographic regions
  • Compliance Considerations: Helps meet data sovereignty and regulatory requirements
  • Implementation Methods: DNS-based, anycast IP addresses, or HTTP redirects

GSLB Decision Flow

GSLB Decision Logic:

  • Health Monitoring: Continuous monitoring of all geographic data centers
  • Priority-based Routing: Multiple criteria weighted for optimal routing decisions
  • Load Awareness: Considers current load and capacity of each data center
  • Failover Hierarchy: Configurable fallback chain for disaster scenarios
  • Performance Metrics: Real-time latency and availability measurements influence routing
  • Business Rules: Custom routing based on business requirements or user segments

Load Balancer Deployment Patterns

Active-Passive

Active-Passive Deployment:

  • Primary-Secondary Setup: One load balancer handles all traffic while the other remains on standby
  • Heartbeat Monitoring: Continuous health checks between active and passive units
  • Automatic Failover: Passive unit takes over when active unit fails or becomes unresponsive
  • Failover Time: Typically 2-30 seconds depending on configuration and detection mechanisms
  • Configuration Synchronization: Settings and rules must be kept consistent between units
  • Use Cases: Smaller deployments, cost-constrained environments, or simple redundancy requirements

Active-Active

Active-Active Deployment:

  • Dual Active Configuration: Both load balancers simultaneously handle traffic
  • Traffic Distribution: DNS round robin or geographic routing between load balancers
  • Session Synchronization: Cross-replication of session data and state information
  • Increased Capacity: Provides higher throughput and better resource utilization
  • Complexity: Requires careful configuration to prevent conflicts and ensure consistency
  • Geographic Distribution: Ideal for multi-region deployments with latency optimization
  • Load Balancer Health Monitoring: Mutual health checks between active units

Performance Metrics

Key Load Balancer Metrics

MetricDescriptionTarget
Request RateRequests per secondScale with capacity
Response TimeTime to process request< 100ms
Error RateFailed requests percentage< 0.1%
Connection CountActive connectionsMonitor thresholds
Health Check Pass RateSuccessful health checks> 99.9%

Essential Performance Indicators:

  • Throughput: Measure of total requests processed per second, indicating overall capacity
  • Latency Distribution: Track not just average but p50, p95, and p99 response times
  • Connection Efficiency: Ratio of successful connections to total connection attempts
  • Resource Utilization: CPU, memory, and network usage on load balancer instances
  • Backend Performance: Individual backend server response times and health status
  • SSL Termination Overhead: Additional latency introduced by SSL/TLS processing

Load Balancer Decision Flow

Load Balancing Decision Process:

  • Algorithm Selection: Chooses appropriate distribution method based on configuration
  • Health Verification: Confirms server availability before forwarding requests
  • Failover Logic: Iterates through server list until healthy server is found
  • Connection Management: Maintains connection pools and handles session persistence
  • Monitoring and Logging: Records decision outcomes for performance analysis
  • Adaptive Behavior: Some advanced load balancers adjust algorithms based on performance metrics

Load Balancing Pitfalls

  1. Sticky Sessions: Can create uneven load distribution
  2. Health Check Aggressiveness: Too frequent checks overload servers
  3. DNS TTL: Low TTL increases DNS query volume
  4. Configuration Drift: Inconsistent settings across load balancers

Choosing Load Balancing Algorithm

Round Robin: Good for identical servers with similar capabilities Least Connections: Best for variable request processing times Weighted: When servers have different capacities IP Hash: Required for stateful applications without session replication