Performance and Scalability Patterns
Performance and Scalability Patterns
These patterns optimize system performance, handle increased load, and ensure systems can scale efficiently as demand grows.
Throttling and Rate Limiting
Controls the rate of requests to prevent system overload and ensure fair resource usage. Essential for API protection and multi-tenant systems.
Use When:
- Protecting against traffic spikes
- Ensuring fair usage among clients
- Preventing abuse or DDoS attacks
- Managing resource costs (e.g., cloud API calls)
- Implementing tiered pricing (free vs paid tiers)
Implementation Strategies:
Token Bucket Algorithm:
- Bucket holds tokens, refilled at fixed rate
- Each request consumes a token
- Pros: Allows bursts up to bucket capacity, smooth long-term rate
- Cons: Burst at start of period possible
- Used by: AWS, Google Cloud, Stripe
Leaky Bucket Algorithm:
- Requests enter queue (bucket), processed at fixed rate
- Overflow requests rejected
- Pros: Smooth, predictable output rate
- Cons: No burst allowance, can delay requests
Fixed Window Counter:
- Count requests per fixed time window (e.g., per minute)
- Pros: Simple, low memory
- Cons: Burst at window boundaries (2x rate possible at edges)
Sliding Window Log:
- Track timestamp of each request
- Pros: Most accurate, no boundary issues
- Cons: High memory usage
Sliding Window Counter:
- Hybrid approach: fixed windows with weighted count
- Pros: Good accuracy, lower memory than log
- Used by: CloudFlare, Redis
Example: API gateway limiting each API key to 1000 requests per hour, with burst allowance of 100 requests per minute using token bucket.
Client → API Gateway (Rate Limiter) → Backend
Request 1-100: ALLOWED (burst using bucket tokens)
Request 101-1000: ALLOWED (within hourly limit)
Request 1001: REJECTED (429 Too Many Requests, Retry-After: 3600)
Cache-Aside
Application manages cache explicitly, loading data from cache first and falling back to database if not found.
Use When:
- Need fine-grained control over caching
- Cache and database can become inconsistent temporarily
- Read-heavy workloads with predictable access patterns
How It Works:
- Check cache for data
- If cache miss, load from database
- Store data in cache for future requests
- Handle cache invalidation on updates
Example: User profile service that checks Redis cache first, loads from PostgreSQL on cache miss, and stores result in cache.
GET /user/123:
1. Check Redis for user:123
2. If miss, query PostgreSQL
3. Store in Redis with TTL
4. Return to client
UPDATE /user/123:
1. Update PostgreSQL
2. Invalidate Redis cache for user:123
Cache-Through
Cache sits between application and database, automatically loading and storing data.
Use When:
- Want automatic cache management
- Can tolerate cache-through latency
- Prefer consistency over performance
Example: Application server with cache-through layer that automatically manages product catalog caching between application and database.
Application → Cache Layer → Database
Read: Cache handles fetching from DB if not cached
Write: Cache handles updating both cache and DB
Sharding
Horizontally partitions data across multiple database instances based on a sharding key.
Use When:
- Single database cannot handle load
- Data set is too large for one server
- Need to scale beyond vertical limits
Sharding Strategies:
- Range-based: Partition by value ranges
- Hash-based: Use hash function on sharding key
- Directory-based: Lookup service maps keys to shards
Challenges:
- Complex queries across shards
- Rebalancing when adding/removing shards
- Handling hot spots
Example: Social media platform sharding user data by user ID hash, distributing users evenly across database shards.
User ID 1234 → hash(1234) % 4 = 2 → Shard 2
User ID 5678 → hash(5678) % 4 = 1 → Shard 1
User ID 9012 → hash(9012) % 4 = 0 → Shard 0
Quick Reference
Pattern Comparison
Pattern | Purpose | Complexity | Performance Gain |
---|---|---|---|
Throttling | Prevent overload | Low | N/A (protection) |
Cache-Aside | Reduce DB load | Low | High for reads |
Cache-Through | Simplify caching | Low | Medium for reads |
Sharding | Horizontal scaling | High | High for reads/writes |
Decision Tree
Protecting against overload? → Throttling and Rate Limiting Read-heavy workload? → Cache-Aside or Cache-Through Single DB can’t handle load? → Sharding Need fine-grained cache control? → Cache-Aside Want simple cache management? → Cache-Through
Cache Strategies
Cache-Aside (Lazy Loading):
- Pros: Only cache what’s needed, simple to implement
- Cons: Cache misses impact performance, manual invalidation
Cache-Through (Write-Through):
- Pros: Data always in cache, simpler application code
- Cons: Every write pays cache penalty, unused data cached
Sharding Considerations
Range-based: Easy to implement, risk of hot spots Hash-based: Even distribution, complex range queries Directory-based: Flexible, additional lookup overhead
Found this guide helpful? Share it with your team:
Share on LinkedIn