Performance and Scalability Patterns

These patterns optimize system performance, handle increased load, and ensure systems can scale efficiently as demand grows.

Throttling and Rate Limiting

Rate limiting protects your system from overload, but choosing the right algorithm determines whether you allow bursts or enforce smooth traffic.

Controls the rate of requests to prevent system overload and ensure fair resource usage. Essential for API protection and multi-tenant systems.

Use When:

Protecting against traffic spikes
Ensuring fair usage among clients
Preventing abuse or DDoS attacks
Managing resource costs (e.g., cloud API calls)
Implementing tiered pricing (free vs paid tiers)

Implementation Strategies:

Token Bucket Algorithm:

Bucket holds tokens, refilled at fixed rate
Each request consumes a token
Pros: Allows bursts up to bucket capacity, smooth long-term rate
Cons: Burst at start of period possible
Used by: AWS, Google Cloud, Stripe

Leaky Bucket Algorithm:

Requests enter queue (bucket), processed at fixed rate
Overflow requests rejected
Pros: Smooth, predictable output rate
Cons: No burst allowance, can delay requests

Fixed Window Counter:

Count requests per fixed time window (e.g., per minute)
Pros: Simple, low memory
Cons: Burst at window boundaries (2x rate possible at edges)

Sliding Window Log:

Track timestamp of each request
Pros: Most accurate, no boundary issues
Cons: High memory usage

Sliding Window Counter:

Hybrid approach: fixed windows with weighted count
Pros: Good accuracy, lower memory than log
Used by: CloudFlare, Redis

Algorithm Selection

Token bucket is the most common choice for APIs, as it allows bursts while ensuring long-term rate compliance. Use leaky bucket when you need predictable, smooth output rates.

Example: API gateway limiting each API key to 1000 requests per hour, with burst allowance of 100 requests per minute using token bucket.

Client → API Gateway (Rate Limiter) → Backend
  Request 1-100: ALLOWED (burst using bucket tokens)
  Request 101-1000: ALLOWED (within hourly limit)
  Request 1001: REJECTED (429 Too Many Requests, Retry-After: 3600)

Cache-Aside

Application manages cache explicitly, loading data from cache first and falling back to database if not found.

Use When:

Need fine-grained control over caching
Cache and database can become inconsistent temporarily
Read-heavy workloads with predictable access patterns

How It Works:

Check cache for data
If cache miss, load from database
Store data in cache for future requests
Handle cache invalidation on updates

Example: User profile service that checks Redis cache first, loads from PostgreSQL on cache miss, and stores result in cache.

GET /user/123:
Check Redis for user:123
If miss, query PostgreSQL
Store in Redis with TTL
Return to client

UPDATE /user/123:
Update PostgreSQL
Invalidate Redis cache for user:123

Cache-Through

Cache sits between application and database, automatically loading and storing data.

Use When:

Want automatic cache management
Can tolerate cache-through latency
Prefer consistency over performance

Example: Application server with cache-through layer that automatically manages product catalog caching between application and database.

Application → Cache Layer → Database
  Read: Cache handles fetching from DB if not cached
  Write: Cache handles updating both cache and DB

Sharding

Horizontally partitions data across multiple database instances based on a sharding key.

Use When:

Single database cannot handle load
Data set is too large for one server
Need to scale beyond vertical limits

Sharding Strategies:

Range-based: Partition by value ranges
Hash-based: Use hash function on sharding key
Directory-based: Lookup service maps keys to shards

Challenges:

Complex queries across shards
Rebalancing when adding/removing shards
Handling hot spots

Example: Social media platform sharding user data by user ID hash, distributing users evenly across database shards.

User ID 1234 → hash(1234) % 4 = 2 → Shard 2
User ID 5678 → hash(5678) % 4 = 1 → Shard 1
User ID 9012 → hash(9012) % 4 = 0 → Shard 0

Quick Reference

Pattern Comparison

Pattern	Purpose	Complexity	Performance Gain
Throttling	Prevent overload	Low	N/A (protection)
Cache-Aside	Reduce DB load	Low	High for reads
Cache-Through	Simplify caching	Low	Medium for reads
Sharding	Horizontal scaling	High	High for reads/writes

Decision Tree

Protecting against overload? → Throttling and Rate Limiting Read-heavy workload? → Cache-Aside or Cache-Through Single DB can’t handle load? → Sharding Need fine-grained cache control? → Cache-Aside Want simple cache management? → Cache-Through

Cache Strategies

Cache-Aside (Lazy Loading)

Pros:

Only cache what's needed
Simple to implement

Cons:

Cache misses impact performance
Manual invalidation required

Cache-Through (Write-Through)

Pros:

Data always in cache
Simpler application code

Cons:

Every write pays cache penalty
Unused data gets cached

Sharding Considerations

Range-based: Easy to implement, risk of hot spots Hash-based: Even distribution, complex range queries Directory-based: Flexible, additional lookup overhead