AWS Elastic Load Balancing for System Architects

What Problems Elastic Load Balancing Solves

ELB automatically distributes traffic across multiple targets, scales to handle millions of requests per second, and routes around unhealthy instances, all with no manual intervention.

AWS Elastic Load Balancing (ELB) automatically distributes incoming application traffic across multiple targets to improve availability, scalability, and fault tolerance. It solves critical challenges for production applications:

Availability Problems:

Single EC2 instance cannot provide high availability (instance failure = complete outage)
Manual failover is slow (5-30 minutes to detect failure and reroute traffic)
Applications cannot survive Availability Zone (AZ) outages without manual intervention
Health check monitoring requires custom scripts and external services

Scalability Problems:

Fixed capacity cannot handle traffic spikes (Black Friday, product launches, viral events)
Manual scaling requires provisioning instances and updating DNS (10-30 minute delay)
DNS-based scaling has caching issues (TTL delays can be hours)
No automatic scaling based on real-time traffic patterns

Security Problems:

Applications must handle TLS/SSL termination, consuming CPU resources
Certificate management across dozens of instances is complex and error-prone
No central point for Web Application Firewall (WAF) integration
Backend instances directly exposed to internet without protection layer

Operational Problems:

Traffic distribution is manual and error-prone
No built-in connection draining for graceful instance removal
Monitoring individual instances requires complex aggregation
Blue/green deployments require DNS changes and scripting

ELB’s Solution:

Three load balancer types: Application Load Balancer (ALB) for HTTP/HTTPS, Network Load Balancer (NLB) for TCP/UDP, Gateway Load Balancer (GWLB) for security appliances
Automatic scaling: Scales to handle millions of requests per second with no manual intervention
Built-in health checks: Automatically detects and routes around unhealthy targets
Multi-AZ by default: Distributes traffic across multiple Availability Zones for 99.99% availability
TLS/SSL termination: Offloads encryption/decryption from backend instances
Integration with Auto Scaling: Automatically adds/removes targets based on demand
Free: No hourly charges for the load balancer resource itself (only for LCUs - Load Balancer Capacity Units)

ELB integrates with VPC, Auto Scaling, Route 53, CloudWatch, and AWS Certificate Manager (ACM) to provide a complete high-availability solution.

Load Balancer Types Comparison

AWS offers three modern load balancer types, each optimized for specific use cases:

Feature	Application Load Balancer (ALB)	Network Load Balancer (NLB)	Gateway Load Balancer (GWLB)
OSI Layer	Layer 7 (Application)	Layer 4 (Transport)	Layer 3 (Network) + Layer 7
Protocols	HTTP, HTTPS, HTTP/2, gRPC, WebSocket	TCP, UDP, TLS	IP (all protocols)
Routing	Path, host, header, query string, method	IP address + port	Transparent (GENEVE)
Latency	~10-50ms (connection termination)	<1ms (pass-through)	~5-15ms (inspection overhead)
Throughput	Millions of requests/sec	Millions of packets/sec, 100s Gbps	Depends on appliances
Static IP	No (DNS-based)	Yes (Elastic IP per AZ)	Yes (via GWLB endpoints)
TLS Termination	Yes (built-in)	Yes (2019+)	No (transparent proxy)
WebSocket	Yes	Yes (TCP mode)	No
Target Types	EC2, IP, Lambda, ALB	EC2, IP, ALB	EC2, IP (appliances)
Health Checks	HTTP, HTTPS, gRPC	TCP, HTTP, HTTPS	TCP, HTTP, HTTPS
Cross-Zone LB	Always enabled (no charge)	Optional ($0.01/GB processed)	Always enabled (no charge)
Pricing Model	$0.0225/hour + LCU	$0.0225/hour + NLCU	$0.0125/hour + GLCU
Use Case	Web apps, microservices, APIs	Gaming, IoT, extreme performance	Firewall, IDS/IPS, DPI

When to Use Each Load Balancer

Use Application Load Balancer (ALB) when:

You need HTTP/HTTPS routing (path-based, host-based, header-based)
You want to route to Lambda functions or containers
You need AWS WAF integration for security
You need authentication via Cognito or OIDC
You want advanced request/response manipulation
Latency tolerance is 10-50ms (typical web application)

Use Network Load Balancer (NLB) when:

You need ultra-low latency (<1ms) and high throughput
You require static IP addresses (Elastic IP per AZ)
You need to preserve source IP addresses
You’re load balancing TCP/UDP traffic (databases, game servers, IoT)
You want to expose services via AWS PrivateLink
You need TLS termination for non-HTTP protocols

Use Gateway Load Balancer (GWLB) when:

You need to inspect traffic with third-party security appliances
You’re deploying firewalls, IDS/IPS, or deep packet inspection (DPI) tools
You want transparent traffic interception (bump-in-the-wire)
You need centralized security inspection for multiple VPCs
You’re using third-party appliances from AWS Marketplace

Default Recommendation

Use ALB for most HTTP/HTTPS applications. Use NLB only when you need low latency, static IPs, or PrivateLink. Use GWLB only for security appliance integration.

Application Load Balancer (ALB)

Routing Capabilities

ALB provides sophisticated Layer 7 routing based on request attributes:

1. Path-Based Routing:

/api/* → API target group (ECS containers)
/images/* → Image server target group (EC2 instances)
/admin/* → Admin target group (separate security group)

2. Host-Based Routing:

api.example.com → API target group
www.example.com → Web server target group
admin.example.com → Admin target group

3. Header-Based Routing:

X-Forwarded-For: 10.0.*.* → Internal target group
User-Agent: Mobile* → Mobile-optimized target group

4. Query String Routing:

?version=2 → New version target group
?version=1 → Old version target group

5. HTTP Method Routing:

GET → Read-only replica target group
POST/PUT/DELETE → Primary database target group

6. Source IP Routing:

10.0.0.0/8 → Internal target group
0.0.0.0/0 → External target group

Example: Microservices Routing:

ALB Listener (HTTPS:443)
├─ Rule 1: Host = api.example.com AND Path = /users/* → Users Service (ECS)
├─ Rule 2: Host = api.example.com AND Path = /orders/* → Orders Service (ECS)
├─ Rule 3: Host = api.example.com AND Path = /inventory/* → Inventory Service (Lambda)
├─ Rule 4: Host = www.example.com → Web Frontend (EC2 Auto Scaling)
└─ Default: Return 404

Target Groups

Target Group Types:

1. Instance Target Type (most common):

Targets are EC2 instances registered by instance ID
Traffic routed to primary private IP of instance
Health checks performed against instance
Use case: Auto Scaling groups, traditional applications

2. IP Target Type:

Targets are IP addresses (can be in VPC or on-premises)
Traffic routed to specific IP address and port
Supports IPv4 and IPv6
Use case: Containers (ECS/EKS), on-premises servers via Direct Connect/VPN, Lambda functions

3. Lambda Target Type:

Targets are Lambda functions
ALB invokes Lambda function with HTTP request as JSON event
Lambda returns HTTP response
Use case: Serverless APIs, event-driven applications

4. ALB Target Type (NLB only):

NLB can forward traffic to ALB
Combines NLB static IPs with ALB Layer 7 routing
Use case: PrivateLink services that need content-based routing

Target Group Attributes:

Attribute	Default	Purpose
Deregistration delay	300 seconds	Connection draining timeout
Slow start duration	0 seconds (disabled)	Ramp-up time for new targets
Stickiness	Disabled	Session affinity via cookies
Load balancing algorithm	Round robin	Request distribution method
HTTP/2	Enabled	Protocol between ALB and clients
gRPC	Disabled	gRPC protocol support

Deregistration Delay (Connection Draining):

When target is deregistered, ALB stops sending new requests
Existing connections continue for up to deregistration delay (default 300s)
After delay, ALB forcibly closes remaining connections
Best practice: Set to 30-60 seconds for most applications

Slow Start:

New targets receive gradually increasing share of traffic
Example: 30-second slow start gives target time to warm up caches
Linear ramp-up from 0% to full share over slow start duration
Use case: Applications with significant warm-up time (JVM, caches)

Stickiness (Session Affinity):

ALB sets application cookie (AWSALB) to bind user to specific target
Duration: 1 second to 7 days (default 1 day)
Use case: Stateful applications that store session data locally
Better alternative: Use external session store (ElastiCache, DynamoDB)

Health Checks

ALB performs active health checks to detect unhealthy targets:

Health Check Configuration:

Parameter	Range	Default	Recommendation
Protocol	HTTP, HTTPS	HTTP	Match target protocol
Path	Any URI	`/`	Use dedicated health endpoint
Port	1-65535	Traffic port	Use traffic port unless custom
Interval	5-300 seconds	30 seconds	10-30s for faster detection
Timeout	2-120 seconds	5 seconds	5s for most apps
Healthy threshold	2-10	5	2-3 for faster recovery
Unhealthy threshold	2-10	2	2 for faster detection
Success codes	200-499	200	200 for most apps

Health Check Example:

Interval: 10 seconds
Timeout: 5 seconds
Healthy threshold: 2
Unhealthy threshold: 2

Behavior:

Unhealthy detection: 2 consecutive failures = 20 seconds (2 × 10s interval)
Healthy recovery: 2 consecutive successes = 20 seconds
Best practice: Use dedicated health endpoint that checks dependencies (database, cache)

Advanced Health Check Features (2024):

Target Group Zonal Health Thresholds:

Minimum number of healthy targets per AZ (percentage or count)
If healthy targets fall below threshold, AZ is marked unhealthy
DNS failover routes traffic only to healthy AZs
Use case: Prevent cascading failures when AZ is degraded

Example:

Target group has 10 instances across 2 AZs (5 per AZ)
Zonal health threshold: 50% (minimum 2.5 instances, rounds to 3)
If AZ-A has only 2 healthy instances, mark AZ-A unhealthy
Route 53 DNS health check fails for AZ-A, traffic goes only to AZ-B

TLS/SSL Termination

Certificate Management:

ALB integrates with AWS Certificate Manager (ACM)
ACM provides free SSL/TLS certificates with automatic renewal
Supports Server Name Indication (SNI) for multiple certificates
Up to 25 certificates per ALB (default quota)

SNI (Server Name Indication):

Single ALB can host multiple HTTPS websites with different certificates
Client sends hostname in TLS handshake
ALB selects appropriate certificate based on hostname
No additional cost for SNI

Example:

ALB Listener (HTTPS:443)
├─ api.example.com → Certificate A (ACM)
├─ www.example.com → Certificate B (ACM)
└─ admin.example.com → Certificate C (ACM)

TLS Policies:

ALB supports TLS 1.0, 1.1, 1.2, 1.3
Default policy: ELBSecurityPolicy-TLS13-1-2-2021-06 (TLS 1.2/1.3 only)
Recommended: Use default policy (TLS 1.2/1.3) unless legacy clients required
FIPS 140-3 validated: Use ELBSecurityPolicy-TLS13-1-2-FIPS-2023-04 for compliance

Cipher Suites:

Modern policies support forward secrecy (ECDHE)
AES-GCM for encryption (faster than CBC)
ChaCha20-Poly1305 for mobile devices

Authentication and Authorization

AWS Cognito Integration:

Offload authentication to Cognito User Pools
ALB verifies JWT tokens automatically
Redirects unauthenticated users to Cognito login page
Use case: Web applications needing user authentication

OIDC (OpenID Connect) Integration:

Integrate with third-party identity providers (Google, Facebook, Okta, Auth0)
ALB performs OAuth 2.0 authorization code flow
Sets headers (X-Amzn-Oidc-Data, X-Amzn-Oidc-Identity) for backend
Use case: Enterprise SSO, social login

Configuration Example:

Listener Rule:
  IF Path = /app/*
  THEN
    1. Authenticate via Cognito User Pool
    2. Forward to Application Target Group

Integration with AWS WAF

WAF Capabilities on ALB:

Filter requests based on IP address, geographic location, HTTP headers, body content
Block SQL injection and cross-site scripting (XSS) attacks
Rate limiting (requests per IP address per 5 minutes)
Managed rule groups (OWASP Top 10, known bad inputs, bot control)

Pricing:

Web ACL: $5/month
Rules: $1/month per rule
Requests: $0.60 per 1 million requests

Example Rules:

Block traffic from specific countries (geo-blocking)
Rate limit: Max 2,000 requests per IP per 5 minutes
AWS Managed Rules: Core Rule Set (CRS) for OWASP Top 10

Cross-Zone Load Balancing

ALB Behavior:

Cross-zone load balancing is always enabled for ALB
No additional charges for cross-zone traffic
Traffic distributed evenly across all registered targets in all enabled AZs

Example:

AZ-A: 2 instances
AZ-B: 8 instances
Without cross-zone: AZ-A gets 50% traffic (each instance gets 25%), AZ-B gets 50% (each instance gets 6.25%)
With cross-zone (ALB default): All 10 instances get 10% traffic each (balanced)

Cost Model

ALB Pricing (US East, January 2025):

Component	Price
Load Balancer Hour	$0.0225 per hour (~$16.20/month)
LCU (Load Balancer Capacity Unit)	$0.008 per LCU-hour

LCU Dimensions (charged for highest dimension):

New connections: 25 connections/second
Active connections: 3,000 connections/minute
Processed bytes: 1 GB/hour (2 GB for HTTP/2)
Rule evaluations: 1,000 rule evaluations/second

Example Cost Calculation:

Scenario: Web application with moderate traffic

New connections: 100/second = 4 LCUs
Active connections: 5,000/minute = 1.67 LCUs
Processed bytes: 10 GB/hour = 10 LCUs ← Highest dimension
Rule evaluations: 500/second = 0.5 LCUs

Monthly Cost:

Load balancer: $16.20
LCUs: 10 LCUs × $0.008 × 730 hours = $58.40
Total: $74.60/month

Cost Optimization:

Consolidate multiple ALBs using host-based routing (save $16.20/month per ALB removed)
Reduce rule complexity (fewer rule evaluations)
Enable HTTP/2 for clients (better compression, fewer connections)
Use CloudFront in front of ALB (cache static content, reduce processed bytes by 60-90%)

Network Load Balancer (NLB)

Performance Characteristics

Ultra-Low Latency:

<1 millisecond latency (vs. ALB’s 10-50ms)
Connection pass-through (no termination)
Single-packet processing for UDP

High Throughput:

Millions of requests per second
Handles sudden traffic spikes with no pre-warming
100s of Gbps throughput per AZ

Millions of Concurrent Connections:

Supports millions of simultaneous TCP connections
No connection limits (unlike ALB)

Example Use Cases:

Online gaming: Sub-millisecond latency critical for real-time gameplay
IoT: Millions of device connections
Financial trading: Ultra-low latency requirements
Video streaming: High throughput for live streams

Static IP Addresses

Elastic IP Support:

NLB assigns one Elastic IP per AZ
Static IP addresses remain constant even if NLB scales
Supports IP whitelisting for firewalls
Use case: Partners/clients require IP whitelisting

Example:

NLB in 3 AZs:
  us-east-1a: 54.123.45.67 (Elastic IP A)
  us-east-1b: 54.123.45.68 (Elastic IP B)
  us-east-1c: 54.123.45.69 (Elastic IP C)

DNS record: nlb-example.us-east-1.elb.amazonaws.com
  → Resolves to all 3 Elastic IPs

Private IP Addresses:

NLB can use private IPs instead of Elastic IPs
Use case: Internal load balancers

Source IP Preservation

Client IP Preservation:

NLB preserves source IP address by default (no X-Forwarded-For header needed)
Backend sees actual client IP in connection
Use case: IP-based access control, logging, analytics

Behavior:

ALB: Backend sees ALB’s IP, client IP in X-Forwarded-For header
NLB: Backend sees client’s IP directly

Proxy Protocol v2:

Optional for TCP listeners
Prepends connection metadata (source IP, destination IP, ports) to TCP stream
Use case: Applications that need client IP but accept connections through proxy

TLS Termination (2019+)

NLB TLS Features:

Terminate TLS connections at NLB
Offload TLS from backend instances (80% CPU reduction observed)
Supports SNI for multiple certificates
Access logs include TLS protocol version, cipher suite, handshake time

Performance Impact:

Without TLS termination: EC2 instances at 80%+ CPU for TLS handshakes
With TLS termination: EC2 instances at ~6% CPU (NLB handles TLS)

Configuration:

Listener: TLS:443
  Default SSL certificate: ACM certificate
  SSL policy: ELBSecurityPolicy-TLS13-1-2-2021-06
  Target Group: TCP:8080 (unencrypted to backend)

Use case: Non-HTTP protocols that need TLS (MQTT, custom TCP applications)

QUIC Protocol Support (2024)

What is QUIC:

Modern transport protocol built on UDP
Faster connection establishment than TCP+TLS (0-RTT)
Improved performance on mobile networks (connection migration)
Used by HTTP/3

NLB QUIC Support (announced 2024):

Forward QUIC traffic (UDP) to targets
Session stickiness using QUIC Connection IDs
Ultra-low latency (sub-millisecond)
Use case: Mobile-first applications, HTTP/3, video conferencing

Cross-Zone Load Balancing

NLB Behavior:

Cross-zone load balancing is optional (disabled by default)
If enabled, charged $0.01 per GB processed (cross-AZ data transfer)
If disabled, traffic distributed only to targets in same AZ as NLB node

When to Enable Cross-Zone LB:

Uneven target distribution across AZs
Need even traffic distribution regardless of AZ
Cross-AZ data transfer cost ($0.01/GB) is acceptable

When to Disable Cross-Zone LB:

Targets evenly distributed across AZs
Want to minimize cross-AZ data transfer costs
Prefer AZ isolation (traffic stays in same AZ)

Example:

AZ-A: 2 instances
AZ-B: 8 instances
Without cross-zone: AZ-A instances get 50% each (overloaded), AZ-B instances get 6.25% each
With cross-zone: All 10 instances get 10% each (balanced) but $0.01/GB for cross-AZ traffic

PrivateLink Integration

AWS PrivateLink:

Expose services to other VPCs without VPC peering or Transit Gateway
NLB is required for PrivateLink endpoints
Consumers access service via VPC endpoint (private IP)
Traffic never traverses public internet

Use Case:

SaaS provider exposing service to customer VPCs
Sharing internal services across AWS accounts
Accessing AWS Marketplace partner services

Example:

Provider VPC:
  Application running on EC2 instances
  NLB in front of instances
  VPC Endpoint Service pointing to NLB

Consumer VPC:
  VPC Endpoint interface in consumer's subnet
  Consumer accesses service via endpoint's private IP

Cost Model

NLB Pricing (US East, January 2025):

Component	Price
Load Balancer Hour	$0.0225 per hour (~$16.20/month)
NLCU (NLB Capacity Unit)	$0.006 per NLCU-hour
Cross-Zone Data Transfer	$0.01 per GB (if enabled)

NLCU Dimensions (charged for highest dimension):

New connections: 800 connections/second (TCP), 400 flows/second (UDP)
Active connections: 100,000 connections/minute (TCP), 50,000 flows/minute (UDP)
Processed bytes: 1 GB/hour

Example Cost Calculation:

Scenario: Gaming server with high connection rate

New TCP connections: 5,000/second = 6.25 NLCUs
Active connections: 200,000/minute = 2 NLCUs
Processed bytes: 50 GB/hour = 50 NLCUs ← Highest dimension

Monthly Cost:

Load balancer: $16.20
NLCUs: 50 NLCUs × $0.006 × 730 hours = $219
Cross-zone (100 TB/month): $1,000
Total: $1,235.20/month

Cost Optimization:

Disable cross-zone load balancing if targets are evenly distributed (save $0.01/GB)
Use Target Group Weighting to route traffic to cheaper regions
Consolidate NLBs where possible

Gateway Load Balancer (GWLB)

Transparent Traffic Inspection

Bump-in-the-Wire Architecture:

GWLB operates transparently (does not modify packets)
Preserves source/destination IP addresses
Uses GENEVE protocol (UDP port 6081) to encapsulate traffic
Security appliances see original packet headers

GENEVE Encapsulation:

Original Packet:
  Source IP: 203.0.113.5 (client)
  Destination IP: 10.0.1.50 (application server)

GWLB Encapsulates:
  Outer Header: GWLB → Security Appliance
  GENEVE Header: Flow metadata
  Inner Packet: Original packet (unchanged)

Security Appliance:
  Decapsulates GENEVE
  Inspects original packet
  Re-encapsulates and returns to GWLB

GWLB Forwards:
  Original packet to application server (source IP preserved)

Supported Security Appliances

Third-Party Appliances (AWS Marketplace):

Firewalls: Palo Alto Networks, Fortinet, Check Point, Cisco
IDS/IPS: Suricata, Snort, Trend Micro
Deep Packet Inspection (DPI): Gigamon, Ixia, cPacket
Network Analytics: Kentik, Flowmon

Appliance Requirements:

Must support GENEVE encapsulation (UDP port 6081)
Must be deployed in VPC as EC2 instances or Auto Scaling groups
Must have health check endpoint

Deployment Patterns

1. Centralized Inspection VPC:

Dedicated inspection VPC hosts security appliances
GWLB endpoints in each application VPC
Traffic flows: App VPC → GWLB Endpoint → Inspection VPC → GWLB → Appliances → GWLB → GWLB Endpoint → App VPC

Architecture:

Application VPC A                Inspection VPC              Application VPC B
─────────────────                ───────────────             ─────────────────
EC2 Instances                    GWLB                        EC2 Instances
      ↓                            ↓                                ↓
GWLB Endpoint A   ←─────→    Security Appliances    ←─────→   GWLB Endpoint B
                                    ↓
                              Firewall/IDS/IPS
                            (Auto Scaling Group)

Use Case: Centralized security team manages appliances for multiple application teams

2. Distributed Ingress Inspection:

Each VPC has own GWLB endpoint
Internet Gateway routes traffic to GWLB endpoint first
Traffic inspected before reaching application

Architecture:

Internet Gateway
      ↓
VPC Ingress Route Table: 0.0.0.0/0 → GWLB Endpoint
      ↓
GWLB Endpoint → Inspection VPC → Security Appliances
      ↓
Application Subnet

Use Case: Inspect all inbound internet traffic for threats

3. Egress Inspection:

Application instances route outbound traffic to GWLB endpoint
Appliances inspect and allow/block based on policy
Clean traffic exits via NAT Gateway

Architecture:

Application Instances
      ↓
Route Table: 0.0.0.0/0 → GWLB Endpoint
      ↓
GWLB → Security Appliances (data loss prevention, malware scanning)
      ↓
GWLB Endpoint → NAT Gateway → Internet

Use Case: Data loss prevention (DLP), prevent exfiltration

Transit Gateway Appliance Mode

Problem:

Without appliance mode, forward and reverse flows can go to different appliance instances
Stateful appliances (firewalls) need both flows to same instance

Solution: Transit Gateway Appliance Mode

Ensures symmetric routing (both directions go to same appliance instance)
Enabled per attachment on Transit Gateway
Use case: Stateful firewalls, IDS/IPS requiring full flow visibility

Configuration:

Transit Gateway Attachment to Inspection VPC:
  Appliance Mode Support: Enabled

Behavior:

Flow hash based on 5-tuple (source IP, dest IP, source port, dest port, protocol)
Forward and reverse flow hash to same appliance instance
Appliance instance sees complete bidirectional flow

Auto Scaling for Appliances

Integration with Auto Scaling Groups:

Security appliances deployed in Auto Scaling groups
GWLB automatically registers new instances
Scales based on CPU, network throughput, or custom metrics

Example Scaling Policy:

Auto Scaling Group: Security Appliances
  Min: 2 instances (high availability)
  Max: 20 instances (burst capacity)
  Desired: 4 instances

Scaling Policies:
  Scale out: CPU > 70% for 5 minutes
  Scale in: CPU < 30% for 15 minutes

Cost Benefit:

Pay only for appliances needed (scale down during off-peak)
Auto Scaling reduces costs by 40-60% vs. fixed capacity

Cost Model

GWLB Pricing (US East, January 2025):

Component	Price
Load Balancer Hour	$0.0125 per hour (~$9.13/month)
GLCU (Gateway LCU)	$0.004 per GLCU-hour

GLCU Dimensions (charged for highest dimension):

Processed bytes: 1 GB/hour
New connections: 600 connections/second

Example Cost Calculation:

Scenario: Centralized inspection for 3 VPCs

Processed bytes: 200 GB/hour = 200 GLCUs ← Highest dimension
New connections: 1,000/second = 1.67 GLCUs

Monthly Cost:

GWLB: $9.13
GLCUs: 200 GLCUs × $0.004 × 730 hours = $584
Security appliances: 4 × c5.xlarge × $0.17/hour × 730 = $496.40
Total: $1,089.53/month

Plus: AWS Marketplace software costs for third-party appliances (varies by vendor)

Advanced Features

Connection Draining (Deregistration Delay)

How It Works:

Target is deregistered (instance terminated, Auto Scaling scale-in, manual deregistration)
Load balancer marks target as “draining”
Stops sending new connections/requests to draining target
Existing connections continue for up to deregistration delay (default 300 seconds)
After delay, forcibly closes remaining connections

Configuration:

ALB: 0-3600 seconds (default 300)
NLB: 0-3600 seconds (default 300)
GWLB: 0-3600 seconds (default 300)

Best Practice:

Web applications: 30-60 seconds (most requests complete in <30s)
Long-lived connections: 300-600 seconds (WebSocket, streaming)
Batch processing: 3600 seconds (allow jobs to complete)

Target Group Weighting

Use Case: Weighted traffic distribution for blue/green deployments, canary releases

How It Works:

Attach multiple target groups to same listener rule
Assign weight to each target group (0-999)
Traffic distributed proportionally based on weights

Example: Canary Deployment

Listener Rule: Path = /api/*
  Target Group 1 (Blue - Current Version): Weight 90
  Target Group 2 (Green - New Version): Weight 10

Result: 90% traffic to Blue, 10% to Green

Cost Benefit:

Gradual rollout reduces risk (only 10% users affected by bugs)
Can validate new version with real traffic before full rollout

Access Logs

ALB Access Logs:

Captures detailed information about requests
Stored in S3 bucket (GZIP compressed)
Includes: timestamp, client IP, latencies, request paths, server responses, TLS cipher, user agent

NLB Access Logs:

Captures TLS connection details
Includes: TLS protocol version, cipher suite, connection time, handshake time

GWLB Access Logs:

Captures flow information
Includes: source/destination IPs, ports, protocol, bytes transferred

Cost:

Access logs: Free
S3 storage: $0.023/GB per month (Standard)
Best practice: Use S3 Intelligent-Tiering or lifecycle policies to reduce storage costs

Use Case:

Security analysis (detect anomalies, brute force attacks)
Performance monitoring (identify slow endpoints)
Compliance auditing (access logs for PCI DSS, HIPAA)

Integration with CloudWatch

Metrics (published automatically, no charge):

ALB Metrics:

ActiveConnectionCount: Current connections
TargetResponseTime: Latency from target
RequestCount: Total requests
HTTPCode_Target_4XX_Count, HTTPCode_Target_5XX_Count: Error rates
TargetConnectionErrorCount: Failed connections to targets
UnHealthyHostCount, HealthyHostCount: Target health

NLB Metrics:

ActiveFlowCount: Current TCP/UDP flows
NewFlowCount: New flows per minute
ProcessedBytes: Bytes processed
TCP_Client_Reset_Count: Client resets
TCP_Target_Reset_Count: Target resets

GWLB Metrics:

ActiveFlowCount: Current flows
NewFlowCount: New flows per minute
ProcessedBytes: Bytes processed

Alarms (recommended):

UnHealthyHostCount > 0 for 5 minutes → Page on-call engineer
TargetResponseTime > 1 second for 10 minutes → Investigate performance
HTTPCode_Target_5XX_Count > 100 for 5 minutes → Investigate application errors

Cost:

Standard metrics: Free
Alarms: $0.10 per alarm per month (first 10 alarms free)

Common Pitfalls and How to Avoid Them

1. Not Using Dedicated Health Check Endpoints

Problem: Health checks against application root (/) don’t verify dependencies (database, cache, external APIs).

Example: Application returns 200 OK from / even when database is down. Load balancer considers target healthy and sends traffic. Users get 500 errors.

Impact: False positives (unhealthy targets marked healthy), 5-20% error rate during database outages.

Solution:

Create dedicated health check endpoint (/health or /healthz)
Check critical dependencies (database connectivity, cache availability)
Return 200 only if all dependencies healthy, 503 if any dependency unhealthy

Example Implementation:

[HttpGet("/health")]
public IActionResult HealthCheck()
{
    bool dbHealthy = CheckDatabaseConnection();
    bool cacheHealthy = CheckCacheConnection();

    if (dbHealthy && cacheHealthy)
    {
        return Ok(new { status = "healthy", database = "ok", cache = "ok" });
    }
    else
    {
        return StatusCode(503, new { status = "unhealthy", database = dbHealthy ? "ok" : "degraded", cache = cacheHealthy ? "ok" : "degraded" });
    }
}

Cost Impact: Reduces error rate from 5-20% to <0.1% during partial outages.

2. Setting Deregistration Delay Too High

Problem: Long deregistration delay (default 300 seconds) delays Auto Scaling scale-in and deployments.

Example: Auto Scaling wants to terminate instance during scale-in. Waits 300 seconds for connections to drain. During this time, paying for idle instance.

Impact: 5-10 minutes deployment delays, $0.10-$0.50 wasted per scale-in event.

Solution:

Set deregistration delay to match application’s typical request duration
Web applications: 30-60 seconds
API services: 30 seconds
Long-lived WebSocket: 300-600 seconds

Cost Impact: Reduce deployment time from 10 minutes to 2 minutes. Save $0.30 per scale-in event × 100 events/month = $30/month.

3. Not Enabling Cross-Zone Load Balancing for Uneven Target Distribution

Problem: Targets unevenly distributed across AZs, cross-zone LB disabled, some targets overloaded.

Example (NLB):

AZ-A: 2 instances
AZ-B: 8 instances
Cross-zone LB disabled
AZ-A instances get 50% of traffic each (overloaded), AZ-B instances get 6.25% each (underutilized)

Impact: AZ-A instances at 90% CPU (slow responses), AZ-B instances at 15% CPU (wasted capacity).

Solution:

ALB: Cross-zone always enabled (no action needed)
NLB: Enable cross-zone load balancing if target distribution is uneven
Cost: $0.01/GB for cross-AZ data transfer (NLB only)

Cost Impact:

Cross-zone cost: 100 GB/hour × 730 hours × $0.01 = $730/month
But: Eliminates need to over-provision AZ-A instances (save $200/month on EC2)
Net savings: -$530/month (cost increase, but better performance and lower EC2 costs)

4. Using ALB When NLB is Required for PrivateLink

Problem: PrivateLink requires NLB, but ALB was deployed. Must rebuild infrastructure to switch.

Example: SaaS provider wants to expose service via PrivateLink for customer VPCs. Deployed ALB. Discovers PrivateLink requires NLB.

Impact: 1-2 week project to migrate from ALB to NLB, redeploy VPC endpoint services.

Solution:

If you plan to use PrivateLink, start with NLB
If you need ALB features (Layer 7 routing) AND PrivateLink, use NLB → ALB chaining

Configuration (NLB → ALB):

PrivateLink VPC Endpoint Service → NLB (static IPs) → ALB (Layer 7 routing) → Target Group

Cost Impact: NLB → ALB chaining adds $16.20/month (NLB cost) but enables both PrivateLink and Layer 7 routing.

5. Not Setting Slow Start for High-Warmup Applications

Problem: New targets receive full traffic immediately, overwhelmed during cache warm-up.

Example: Java application with 60-second JVM warm-up. New instance added to target group, immediately receives 10% of traffic. Instance slow to respond (cache cold, JVM not optimized). Users experience 2-5 second response times.

Impact: 50% of requests to new instances have 2-5 second latency during first 60 seconds.

Solution:

Enable slow start with duration matching warm-up time (30-120 seconds)
New targets ramp from 0% to full share over slow start duration

Configuration:

Target Group:
  Slow start duration: 60 seconds

Cost Impact: Eliminates latency spikes during scale-out events. Improves user experience (P99 latency: 5 seconds → 500ms).

6. Forgetting to Enable HTTP/2 for Better Performance

Problem: HTTP/2 disabled, clients forced to use HTTP/1.1 (multiple connections, head-of-line blocking).

Example: Web application serves 50 assets per page. HTTP/1.1 requires 6 connections per domain (browser limit). HTTP/2 uses 1 multiplexed connection.

Impact: 6x more connections, higher latency (100ms vs. 50ms page load time).

Solution:

Enable HTTP/2 on ALB listeners (enabled by default for new ALBs)
Verify client applications support HTTP/2

Cost Impact: Reduces LCU usage (fewer connections). Save 20-40% on LCU costs. Example: $60/month → $40/month.

7. Not Using CloudFront in Front of ALB for Static Content

Problem: ALB serves all content including static assets, high processed bytes cost.

Example: Application serves images, CSS, JS directly from ALB. 500 GB/hour processed bytes = 500 LCUs × $0.008 × 730 = $2,920/month.

Impact: High LCU costs for content that could be cached.

Solution:

Deploy CloudFront in front of ALB
CloudFront caches static assets (images, CSS, JS) at edge locations
ALB serves only dynamic content (APIs, authenticated requests)

Cost Impact:

Before: 500 GB/hour × 730 hours = 365 TB processed bytes = $2,920/month (LCUs)
After: 50 GB/hour × 730 hours = 36.5 TB processed bytes = $292/month (LCUs) + $200/month (CloudFront) = $492/month
Savings: $2,428/month (83% reduction)

8. Not Configuring Stickiness for Stateful Applications

Problem: Stateful application stores session data locally. Requests from same user routed to different instances. User loses session (logged out).

Example: E-commerce application stores shopping cart in instance memory. User adds item to cart (instance A). Next request goes to instance B (no cart data). User sees empty cart.

Impact: Poor user experience, lost revenue (users abandon checkout).

Solution:

Enable stickiness (session affinity) on target group
ALB sets AWSALB cookie to bind user to specific instance
Duration: 1 second to 7 days (default 1 day)

Better Solution:

Use external session store (ElastiCache, DynamoDB)
Stateless instances can handle any request
Better scalability and resilience

Cost Impact: Enabling stickiness is free. Using ElastiCache costs $30-100/month but enables better scaling and resilience.

9. Using Multiple ALBs Instead of Host-Based Routing

Problem: Separate ALB for each microservice increases costs and complexity.

Example: 5 microservices, each with own ALB. 5 × $16.20/month = $81/month just for load balancer hours.

Impact: Unnecessary costs, complex management (5 ALBs to monitor).

Solution:

Use single ALB with host-based or path-based routing
Route to different target groups based on hostname or path

Configuration:

ALB Listener (HTTPS:443)
├─ Host = api.example.com → API Target Group
├─ Host = admin.example.com → Admin Target Group
├─ Host = web.example.com → Web Target Group
├─ Host = mobile.example.com → Mobile Target Group
└─ Default → 404

Cost Impact:

Before: 5 ALBs × $16.20/month = $81/month
After: 1 ALB × $16.20/month = $16.20/month
Savings: $64.80/month (80% reduction in LB costs)

10. Not Setting Health Check Interval and Thresholds Appropriately

Problem: Default health check settings (30s interval, 5 healthy threshold) delay recovery by 150 seconds.

Example: Target becomes healthy. Load balancer requires 5 consecutive successes at 30s interval = 150 seconds before routing traffic.

Impact: 2.5 minute delay before recovered instance receives traffic.

Solution:

Reduce interval to 10 seconds
Reduce healthy threshold to 2
Recovery time: 2 × 10s = 20 seconds

Configuration:

Health Check:
  Interval: 10 seconds
  Healthy threshold: 2
  Unhealthy threshold: 2

Cost Impact: Faster recovery reduces error rate during rolling deployments from 5% to 1%. Improves user experience.

11. Not Using TLS Termination on NLB for Non-HTTP Protocols

Problem: Backend instances handle TLS for custom TCP applications, consuming 80% CPU.

Example: MQTT broker over TLS. Each EC2 instance at 80% CPU just for TLS handshakes. Only 20% CPU available for application logic.

Impact: Need 4x more instances to handle same application load (high costs).

Solution:

Enable TLS termination on NLB
Backend instances receive unencrypted TCP traffic
CPU usage drops from 80% to <10% for TLS

Cost Impact:

Before: 16 × c5.xlarge instances ($0.17/hour) = $1,987/month
After: 4 × c5.xlarge instances (TLS offloaded) = $497/month
Savings: $1,490/month (75% reduction)

12. Deploying Security Appliances Without Auto Scaling (GWLB)

Problem: Fixed number of security appliances can’t handle traffic spikes, becomes bottleneck.

Example: 4 firewall instances handle normal traffic (50 Gbps). Traffic spike to 200 Gbps during DDoS attack. Firewalls overwhelmed, drop packets.

Impact: Legitimate traffic dropped during attack, application unavailable.

Solution:

Deploy security appliances in Auto Scaling group
Scale based on CPU, network throughput, or custom metrics
GWLB automatically registers new instances

Configuration:

Auto Scaling Group:
  Min: 2 (HA)
  Max: 20 (burst capacity)
  Scaling Policy: Network In > 10 Gbps per instance → Scale out

Cost Impact:

Normal traffic: 4 instances × $1,000/month = $4,000/month
Traffic spike: 16 instances × $1,000/month × 2 hours/month = $1,000/month (burst)
Total: $5,000/month vs. $20,000/month (fixed 16 instances)
Savings: $15,000/month (75% reduction) with better burst capacity

13. Not Configuring Target Group Zonal Health Thresholds

Problem: Degraded AZ continues receiving traffic even when most targets are unhealthy, causing cascading failures.

Example: AZ-A has 10 instances. Database in AZ-A fails. 8 instances become unhealthy (can’t reach database). Load balancer still routes 20% of traffic to AZ-A (2 healthy instances overwhelmed).

Impact: 2 healthy instances in AZ-A overwhelmed, become unhealthy. Cascading failure spreads to other AZs.

Solution:

Configure target group zonal health threshold (2024 feature)
If healthy targets in AZ fall below threshold, mark entire AZ unhealthy
Route 53 DNS failover routes traffic only to healthy AZs

Configuration:

Target Group:
  Zonal health threshold: 70% (minimum 7 healthy targets)

Behavior:

AZ-A has 10 targets, 8 unhealthy → 20% healthy (below 70% threshold)
Mark AZ-A unhealthy, stop routing traffic to AZ-A
AZ-B and AZ-C handle all traffic

Cost Impact: Prevents cascading failures that can cause complete outages (cost of outage: $10,000-$100,000+ per hour for e-commerce sites).

14. Using Public ALB for Internal Microservices Communication

Problem: Public ALB for internal microservices exposes services to internet, increases costs.

Example: 5 microservices communicate via public ALB. ALB processes 1 TB/hour of internal traffic. 1 TB/hour × 730 hours = 730 TB/month = $584/month (LCUs) + data transfer.

Impact: Unnecessary costs, security risk (services exposed to internet).

Solution:

Use internal (private) ALB for microservices communication
Internal ALB has no internet-facing IP, only accessible from VPC
No data transfer charges for traffic within same AZ

Cost Impact:

Before: Public ALB + cross-AZ data transfer = $584/month (LCUs) + $730/month (data transfer) = $1,314/month
After: Internal ALB + same-AZ targets = $584/month (LCUs) + $0 (data transfer) = $584/month
Savings: $730/month (56% reduction)

15. Not Testing Failover Behavior During AZ Outages

Problem: Assume multi-AZ setup provides HA, but never test failover. Real AZ outage causes unexpected failures.

Example: Application deployed in 2 AZs with ALB. AZ-A fails. ALB routes all traffic to AZ-B. AZ-B targets overwhelmed (capacity planning assumed 50/50 split).

Impact: AZ-B targets at 200% expected load, slow responses, some requests timeout.

Solution:

Regularly test AZ failover (monthly or quarterly)
Simulate AZ failure by manually deregistering all targets in one AZ
Verify remaining AZs can handle 100% of traffic
Right-size capacity planning (each AZ should handle 100% load, not just 50%)

Cost Impact: Over-provision to 2x capacity costs $1,000/month extra, but prevents outages costing $10,000-$100,000/hour.

Key Takeaways

Load Balancer Selection:

Use ALB for HTTP/HTTPS applications needing Layer 7 routing, WAF integration, Lambda targets, or authentication
Use NLB when you need ultra-low latency (<1ms), static IP addresses, source IP preservation, or PrivateLink integration
Use GWLB only for transparent traffic inspection with third-party security appliances (firewalls, IDS/IPS, DPI)
Default choice: ALB for 80% of use cases

High Availability:

Deploy load balancers in at least 2 AZs (3 AZs recommended for production)
Enable cross-zone load balancing for even traffic distribution (ALB: always enabled free, NLB: optional $0.01/GB)
Configure target group zonal health thresholds to prevent cascading failures during AZ degradation
Test AZ failover regularly (monthly) to verify capacity planning

Health Checks:

Use dedicated health check endpoints that verify critical dependencies (database, cache, external APIs)
Set aggressive intervals and thresholds for faster detection and recovery (10s interval, 2 thresholds)
Return 503 (Service Unavailable) when dependencies are unhealthy, not 200 OK

Performance Optimization:

Set deregistration delay to match application request duration (30-60s for web apps, 300-600s for WebSocket)
Enable slow start for applications with warm-up time (JVM, cache initialization)
Use HTTP/2 on ALB for better performance and lower costs (reduces connections by 6x)
Enable TLS termination on load balancer to offload CPU from backend instances (80% CPU reduction)

Cost Optimization:

Consolidate multiple ALBs using host-based or path-based routing (save $16.20/month per ALB removed)
Use CloudFront in front of ALB to cache static content (reduce processed bytes by 60-90%)
Disable cross-zone load balancing on NLB if targets are evenly distributed (save $0.01/GB)
Use internal (private) load balancers for microservices communication (avoid data transfer charges)
Enable Auto Scaling for GWLB appliances to avoid over-provisioning (save 40-60% vs. fixed capacity)

Security Best Practices:

Always use HTTPS listeners with ACM certificates (free, automatic renewal)
Use latest TLS policy (TLS 1.2/1.3 only) unless legacy clients required
Integrate WAF with ALB for Layer 7 threat protection (SQL injection, XSS, rate limiting)
Use security groups to restrict backend instances to only accept traffic from load balancer
Enable access logs for compliance auditing and security analysis (stored in S3)

Architecture Patterns:

NLB → ALB chaining: Combines NLB static IPs + PrivateLink with ALB Layer 7 routing
CloudFront → ALB: Cache static content at edge, ALB serves dynamic content only
GWLB centralized inspection: Dedicated inspection VPC with security appliances serves multiple application VPCs
Internal ALB: Use for microservices communication to avoid internet exposure and data transfer costs

Monitoring and Observability:

Set CloudWatch alarms for UnHealthyHostCount, TargetResponseTime, HTTPCode_Target_5XX_Count
Monitor cache hit ratio if using CloudFront in front of ALB (target >85%)
Enable access logs for detailed request analysis and troubleshooting
Use X-Ray integration for distributed tracing across ALB and backend services

When NOT to Use Load Balancers:

Single instance applications with no HA requirements (use Elastic IP instead)
Internal services with <10 requests/second (use Route 53 DNS-based routing)
Cost-sensitive dev/test environments (use single instance, deploy load balancer for production only)
Very low traffic (<1000 requests/day) where $16.20/month LB cost is significant