AWS API Gateway for System Architects
What Problems API Gateway Solves
AWS API Gateway is a fully managed service that makes it easy to create, publish, maintain, monitor, and secure APIs at any scale. It solves critical challenges for modern application architectures:
Routing and Integration Problems:
- Backend services (Lambda, EC2, ECS) require HTTP endpoints but lack built-in API infrastructure
- No standard way to route requests to multiple backend services based on path or method
- Integrating with AWS services (DynamoDB, S3, Step Functions) requires custom API code
- Managing API versions and staging environments is manual and error-prone
Security Problems:
- Each backend service must implement authentication and authorization independently
- No centralized API key management or rate limiting
- CORS configuration requires custom code in each service
- SSL/TLS certificate management is manual across multiple services
Scalability Problems:
- Backend services must scale to handle peak traffic (over-provisioned 90% of the time)
- No built-in request throttling to protect backends from traffic spikes
- Caching requires separate infrastructure (ElastiCache, CloudFront)
- No automatic DDoS protection
Monitoring and Observability Problems:
- Request/response logging requires custom implementation in each service
- No centralized metrics for API usage, latency, error rates
- Debugging distributed systems requires correlation IDs and custom tracing
- No built-in request validation or transformation
API Gateway’s Solution:
- Two API types: REST API (full-featured) and HTTP API (71% cheaper, 60% lower latency)
- Multiple authorization mechanisms: IAM, Cognito User Pools, Lambda authorizers, JWT authorizers
- Built-in throttling: Request rate limiting (requests/second) and burst limits
- Response caching: 0.5 GB to 237 GB cache with TTL control
- Request validation: Schema-based validation before invoking backend
- Request/response transformation: Modify requests and responses without changing backend code
- CORS support: Enable cross-origin requests with simple configuration
- Custom domain names: Use your own domain (api.example.com) instead of default AWS domain
- Usage plans and API keys: Monetize APIs with per-customer throttling and quotas
- CloudWatch integration: Automatic metrics (requests, latency, errors, cache hits)
- X-Ray integration: Distributed tracing across API Gateway and backend services
API Gateway integrates with Lambda, ALB, EC2, ECS, DynamoDB, S3, Step Functions, and HTTP endpoints to provide a unified API layer.
REST API vs. HTTP API
AWS offers two types of API Gateway: REST API (full-featured) and HTTP API (optimized for cost and performance).
Feature Comparison
| Feature | REST API | HTTP API |
|---|---|---|
| Cost | $3.50 per million requests | $1.00 per million requests (71% cheaper) |
| Latency | Typical | 60% lower latency |
| Request Size | Meters by request count | Meters in 512 KB increments |
| Authorization | IAM, Cognito, Lambda authorizers, API keys | IAM, Cognito, Lambda authorizers, JWT authorizers |
| Per-client throttling | Yes (usage plans + API keys) | No |
| Request validation | Yes | No |
| Request transformation | Yes (VTL templates) | No |
| Response caching | Yes (0.5-237 GB, $0.02-$3.80/hour) | No |
| AWS WAF integration | Yes | No |
| Private endpoints | Yes (PrivateLink) | No |
| Usage plans and API keys | Yes | No |
| WebSocket | Separate API type | No |
| CORS configuration | Manual (per method) | Automatic (simple config) |
| Custom domain names | Yes | Yes (can share with REST APIs) |
| CloudWatch metrics | Detailed (caching, errors, latency) | Basic (requests, latency, errors) |
| X-Ray tracing | Yes | Yes |
When to Use REST API
Use REST API when you need:
- Per-client throttling: Different rate limits for different API consumers (SaaS tenants, partner tiers)
- Response caching: Reduce backend load and improve latency for cacheable responses
- AWS WAF: Protect against SQL injection, XSS, and other OWASP Top 10 threats
- Request validation: Validate request bodies against JSON Schema before invoking backend
- Request/response transformation: Modify payloads without changing backend code
- Private APIs: Expose APIs only to VPCs via PrivateLink (no public internet access)
- Usage plans and API keys: Monetize APIs with quotas, throttling, and API key management
Typical use cases:
- SaaS APIs: Multi-tenant APIs with per-customer throttling and quota enforcement
- Enterprise APIs: Private APIs accessible only from corporate VPC
- Public APIs with monetization: Usage plans for free/basic/pro tiers
- Legacy integration: Request/response transformation to integrate with existing systems
When to Use HTTP API
Use HTTP API when you need:
- Lower cost: 71% cheaper ($1.00 vs $3.50 per million requests)
- Lower latency: 60% faster response times
- Simple API: No need for caching, WAF, or advanced features
- JWT authorization: Built-in JWT authorizer for OAuth 2.0 or OIDC tokens
Typical use cases:
- Serverless APIs: Simple Lambda-backed APIs for internal services
- Microservices: Low-latency communication between services
- Prototyping: Fast development without complex features
- Mobile/web backends: OAuth 2.0 or OIDC authentication with JWT tokens
Decision Framework
Start with HTTP API if:
- Cost and latency are primary concerns
- You don’t need per-client throttling
- You don’t need response caching
- Simple authorization (JWT, Cognito, IAM) is sufficient
Use REST API if:
- You need per-client throttling (multi-tenant SaaS)
- You need response caching (high-traffic, cacheable responses)
- You need AWS WAF (public APIs with security threats)
- You need private endpoints (corporate VPC-only access)
- You need request validation or transformation
Critical limitation: HTTP APIs lack per-client throttling. Without the ability to throttle per user/tenant, HTTP APIs are not production-ready for SaaS or partner-facing APIs.
Cost crossover: HTTP APIs meter requests in 512 KB increments. For large requests/responses (>1.5 MB), REST APIs may be cheaper. Example: 2 MB request counts as 4 requests for HTTP API ($0.000004) vs. 1 request for REST API ($0.0000035).
Authorization Mechanisms
IAM Authorization
How It Works:
- Clients sign requests using AWS Signature Version 4 (SigV4)
- API Gateway verifies signature using IAM
- Caller must have IAM permissions for
execute-api:Invokeaction
Use Case:
- Internal AWS service-to-service communication
- Temporary credentials via STS
- Cross-account API access
Example IAM Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "execute-api:Invoke",
"Resource": "arn:aws:execute-api:us-east-1:123456789012:abcdef123/prod/GET/users"
}
]
}
Cost: Free
Pros:
- No additional infrastructure (uses IAM)
- Fine-grained permissions per API method
- Supports temporary credentials (STS, EC2 instance profiles)
Cons:
- Clients must have AWS credentials
- Not suitable for public APIs or mobile/web clients
Cognito User Pool Authorizers
How It Works:
- Users authenticate with Cognito User Pool
- Cognito returns JWT access token or ID token
- Client sends token in
Authorizationheader - API Gateway validates token signature against Cognito public key
- If valid, API Gateway invokes backend with user claims
Use Case:
- Web and mobile applications with user authentication
- Social login (Google, Facebook, Amazon, Apple)
- Multi-factor authentication (MFA)
Configuration:
{
"authorizationType": "COGNITO_USER_POOLS",
"authorizerId": "abc123",
"userPoolArn": "arn:aws:cognito-idp:us-east-1:123456789012:userpool/us-east-1_ABC123DEF"
}
Cost:
- API Gateway: Free
- Cognito User Pool: First 50,000 MAUs free, then $0.0055 per MAU
Pros:
- Fully managed user directory
- Built-in UI for sign-up, sign-in, password reset
- Social identity providers (Google, Facebook, etc.)
- MFA support (SMS, TOTP)
Cons:
- Tied to Cognito (vendor lock-in)
- Limited customization of authentication flow
JWT Authorizers (HTTP API Only)
How It Works:
- Identity provider (Okta, Auth0, Keycloak) issues JWT token
- Client sends token in
Authorizationheader - API Gateway validates token signature, issuer, audience, expiration
- If valid, API Gateway invokes backend with claims
Use Case:
- Bring your own identity provider (BYOIDP)
- Enterprise SSO (Okta, Azure AD, Google Workspace)
- OAuth 2.0 or OIDC standard authentication
Configuration:
{
"identitySource": "$request.header.Authorization",
"issuerUrl": "https://cognito-idp.us-east-1.amazonaws.com/us-east-1_ABC123DEF",
"audience": ["client-id-1", "client-id-2"]
}
Cost: Free
Pros:
- No Lambda invocation cost (unlike Lambda authorizers)
- Fast validation (sub-millisecond)
- Works with any OIDC-compliant identity provider
Cons:
- HTTP API only (not available for REST API)
- No custom authorization logic (token validation only)
Lambda Authorizers
How It Works:
- Client sends request with authorization token or request parameters
- API Gateway invokes Lambda authorizer function
- Lambda validates credentials (query database, call external service, custom logic)
- Lambda returns IAM policy document (Allow/Deny) and optional context
- API Gateway caches policy for TTL duration (default 300 seconds)
Authorizer Types:
1. TOKEN Authorizer (REST API):
- Receives bearer token (e.g., JWT, OAuth token) from
Authorizationheader - Lambda validates token and returns IAM policy
- Use case: Custom JWT validation, OAuth token introspection
2. REQUEST Authorizer (REST API and HTTP API):
- Receives full request context (headers, query strings, path parameters, source IP)
- Lambda validates request and returns IAM policy
- Use case: Complex authorization logic, multiple identity sources
Example Lambda Response:
{
"principalId": "user123",
"policyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Action": "execute-api:Invoke",
"Effect": "Allow",
"Resource": "arn:aws:execute-api:us-east-1:123456789012:abcdef123/prod/GET/users/*"
}
]
},
"context": {
"userId": "user123",
"userRole": "admin"
}
}
Caching:
- Default TTL: 300 seconds (5 minutes)
- Range: 0-3600 seconds
- Cache key: Token (TOKEN authorizer) or configured identity sources (REQUEST authorizer)
- Critical: Cached policies apply to all requests with same cache key
Cost:
- Lambda invocations: $0.20 per 1 million invocations
- Lambda duration: $0.0000166667 per GB-second
- Example: 1 million requests, 128 MB, 100ms, 300s cache = 3,333 invocations × $0.20 = $0.00067
Pros:
- Custom authorization logic (query database, call external APIs)
- Works with any authentication mechanism
- Cache reduces invocation costs
Cons:
- Adds latency (10-50ms per cold cache)
- Lambda invocation cost (though caching helps)
- Must implement IAM policy logic
Best Practice:
- Use REQUEST authorizer (supports multiple identity sources)
- Set appropriate TTL (300s for most use cases, 0s for frequently changing permissions)
- Handle both Allow and Deny explicitly
- Include user context for backend to consume
Throttling and Rate Limiting
API Gateway implements throttling at multiple levels to protect backend services from traffic spikes.
Throttling Levels
1. AWS Account Level (Default):
- Rate limit: 10,000 requests per second across all APIs in account-region
- Burst limit: 5,000 concurrent requests
- Cannot be disabled, can be increased via AWS Support
2. Per-API, Per-Stage Level:
- Configure custom rate and burst limits for specific API stage
- Overrides account-level limits for that API
- Use case: Different limits for prod vs dev stages
3. Per-Method Level:
- Configure custom rate and burst limits for specific API method
- Overrides stage-level limits for that method
- Use case: Protect expensive operations (e.g., POST /orders)
4. Per-Client Level (REST API Only):
- Configure custom rate, burst, and quota limits per API key
- Requires usage plans and API keys
- Use case: Multi-tenant SaaS, partner API tiers
Rate Limit vs. Burst Limit
Rate Limit (Steady State):
- Number of requests per second allowed
- Example: 1,000 RPS = 1,000 requests evenly distributed over 1 second
Burst Limit (Spike Handling):
- Number of concurrent requests allowed during traffic spikes
- Uses token bucket algorithm
- Example: 2,000 burst limit = handle up to 2,000 requests simultaneously
Token Bucket Algorithm:
- Bucket capacity = burst limit (e.g., 2,000 tokens)
- Tokens refill at rate limit per second (e.g., 1,000 tokens/second)
- Each request consumes 1 token
- If bucket empty, request is throttled (429 response)
Example Scenario:
- Rate limit: 1,000 RPS
- Burst limit: 2,000
- Normal traffic: 500 RPS → Tokens accumulate (bucket fills to 2,000)
- Traffic spike: 3,000 requests in 1 second
- First 2,000 requests: Consume all tokens (allowed)
- Next 1,000 requests: Bucket empty, throttled with 429 error
Usage Plans and API Keys (REST API Only)
What They Provide:
- Per-client throttling: Different rate/burst limits for different customers
- Quotas: Monthly request limit per customer
- API key management: Generate, distribute, revoke API keys
Configuration:
Step 1: Create API Key
{
"name": "customer-abc",
"enabled": true
}
Step 2: Create Usage Plan
{
"name": "Gold Plan",
"throttle": {
"rateLimit": 5000,
"burstLimit": 10000
},
"quota": {
"limit": 1000000,
"period": "MONTH"
}
}
Step 3: Associate API Key with Usage Plan
Step 4: Client sends API key in header
x-api-key: abc123xyz456
Use Case: SaaS API with tiered pricing
- Free tier: 100 RPS, 200 burst, 100,000 requests/month
- Basic tier: 1,000 RPS, 2,000 burst, 5 million requests/month
- Pro tier: 5,000 RPS, 10,000 burst, 50 million requests/month
Cost: Free (no additional charge for usage plans or API keys)
Limitation: HTTP APIs do not support usage plans or API keys. Use Lambda authorizers for custom per-client logic.
Priority Order
API Gateway applies throttling in this order (most specific wins):
- Per-client throttling (usage plan for specific API key)
- Per-method throttling (specific method like GET /users)
- Per-stage throttling (entire API stage)
- Per-account throttling (all APIs in account-region)
Error Response
When throttled, API Gateway returns:
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
{
"message": "Too Many Requests"
}
Best Practice: Implement exponential backoff with jitter in client applications.
Response Caching
API Gateway can cache responses to reduce backend load and improve latency.
How Caching Works
- Client sends request to API Gateway
- API Gateway checks cache using cache key
- Cache hit: Return cached response immediately (no backend invocation)
- Cache miss: Invoke backend, cache response, return to client
- Cached response expires after TTL (time-to-live)
Cache Configuration
Cache Size (REST API only):
| Cache Size | Hourly Cost | Use Case |
|---|---|---|
| 0.5 GB | $0.020 | Small APIs (<100 unique endpoints) |
| 1.6 GB | $0.038 | Medium APIs (100-500 unique endpoints) |
| 6.1 GB | $0.200 | Large APIs (500-2000 unique endpoints) |
| 13.5 GB | $0.250 | Very large APIs |
| 28.4 GB | $0.500 | |
| 58.2 GB | $1.000 | |
| 118 GB | $1.600 | |
| 237 GB | $3.800 | Massive APIs with high cardinality |
Cache TTL (Time-to-Live):
- Default: 300 seconds (5 minutes)
- Range: 0-3600 seconds
- Per-method override available
Cache Key:
- Default: Request path + query string parameters
- Optional: Include specific headers or path parameters
Example:
GET /products?category=electronics&page=2
Cache Key: /products?category=electronics&page=2
GET /users/{userId}
Cache Key: /users/123 (userId=123)
Cache Invalidation
Explicit Invalidation:
- Per-stage: Invalidate entire cache for API stage
- Requires
InvalidateCachepermission - Use case: Deploy new version, clear stale data
Client-Initiated Invalidation:
- Client sends
Cache-Control: max-age=0header - Requires
InvalidateCachepermission in IAM policy - Use case: Force fresh data for specific request
TTL Expiration:
- Entries expire automatically after TTL
- Most common invalidation mechanism
Cache Hit Ratio Optimization
Target: 70-90% cache hit ratio for optimal cost/performance balance
Strategies:
1. Optimize Cache Key:
- ❌ Include all query parameters:
/search?q=laptop&sort=price&page=1×tamp=1234567890 - ✅ Include only relevant parameters:
/search?q=laptop&sort=price&page=1
2. Set Appropriate TTL:
- Static data (product catalog): 3600 seconds (1 hour)
- Semi-dynamic data (inventory): 300 seconds (5 minutes)
- Dynamic data (user profile): 60 seconds (1 minute)
- Real-time data (stock prices): 0 seconds (no caching)
3. Use Method Overrides:
- GET /products: Cache enabled, TTL 3600s
- POST /orders: Cache disabled (non-idempotent)
- GET /users/me: Cache disabled (user-specific)
4. Monitor Cache Metrics:
CacheHitCount: Number of cache hitsCacheMissCount: Number of cache misses- Cache hit ratio = CacheHitCount / (CacheHitCount + CacheMissCount)
Cost-Benefit Analysis
Scenario: API with 10 million requests/month, 80% cache hit ratio
Without Caching:
- API Gateway: 10M requests × $3.50 / 1M = $35.00
- Lambda invocations: 10M × $0.20 / 1M = $2.00
- Total: $37.00
With Caching (0.5 GB cache):
- API Gateway: 10M requests × $3.50 / 1M = $35.00
- Cache: $0.020/hour × 730 hours = $14.60
- Lambda invocations: 2M (20% cache miss) × $0.20 / 1M = $0.40
- Total: $50.00
Analysis: Caching costs $13/month more but reduces Lambda invocations by 80% and improves latency from ~100ms to ~10ms. Worth it for high-traffic APIs with performance requirements.
Breakeven: Caching becomes cost-effective when backend costs saved exceed cache costs. For expensive backends (EC2, RDS queries), caching saves money. For cheap backends (Lambda), caching improves latency but may increase costs.
Integration Patterns
Lambda Function Integration
Proxy Integration (Recommended):
- API Gateway passes entire request to Lambda as JSON event
- Lambda returns response with status code, headers, body
- No mapping templates required
Example Lambda Response:
{
"statusCode": 200,
"headers": {
"Content-Type": "application/json"
},
"body": "{\"message\":\"Hello World\"}"
}
Custom Integration:
- API Gateway transforms request using VTL (Velocity Template Language)
- Lambda receives transformed input
- API Gateway transforms Lambda response
- Use case: Legacy Lambdas with non-standard response format
Cost: Lambda invocations only (no additional API Gateway cost)
HTTP Endpoint Integration
Proxy Integration:
- API Gateway forwards request to HTTP endpoint
- HTTP endpoint returns response
- No transformation
Example: Proxy to legacy API at https://api.example.com
GET /users → https://api.example.com/users
POST /orders → https://api.example.com/orders
Use Case: Fronting existing REST APIs with API Gateway for throttling, caching, authorization
AWS Service Integration
Direct Integration (No Lambda):
- API Gateway invokes AWS services directly
- Supported services: DynamoDB, S3, Step Functions, SNS, SQS, Kinesis
- Use mapping templates to transform request/response
Example: PUT object to S3
PUT /files/{filename}
→ S3 PutObject (Bucket=my-bucket, Key={filename}, Body=request.body)
Use Case: Simple CRUD operations without Lambda overhead
Cost Savings: Eliminates Lambda invocation cost ($0.20 per million requests)
VPC Link Integration (REST API)
Private Integration:
- API Gateway connects to resources in VPC via VPC Link
- Resources: ALB, NLB, EC2, ECS (private subnets)
- No public IP required on backend
Architecture:
Client → API Gateway → VPC Link → NLB → ECS Tasks (private subnet)
Cost:
- VPC Link: $0.01 per hour + $0.01 per GB processed
- Example: 100 GB/month = $7.30 (VPC Link) + $1.00 (data transfer) = $8.30/month
Use Case: Expose private microservices to internet via API Gateway without public IPs
Cost Optimization Strategies
1. Use HTTP API for Simple Use Cases
Savings: 71% cost reduction ($1.00 vs $3.50 per million requests)
When Applicable:
- No need for per-client throttling
- No need for response caching
- No need for AWS WAF
- Simple authorization (JWT, Cognito, IAM)
Example:
- Before (REST API): 100M requests × $3.50 / 1M = $350/month
- After (HTTP API): 100M requests × $1.00 / 1M = $100/month
- Savings: $250/month (71% reduction)
2. Enable Caching for High-Traffic Cacheable Endpoints
When Cost-Effective:
- Backend invocation cost > cache cost
- High cache hit ratio (>70%)
- Expensive backend operations (database queries, external API calls)
Example:
- 50M requests/month to product catalog API
- 90% cache hit ratio
- Lambda cost: $0.000002 per invocation
Without Cache:
- Lambda: 50M × $0.000002 = $100/month
With Cache (1.6 GB, $0.038/hour):
- Cache: $0.038 × 730 = $27.74/month
- Lambda: 5M (10% miss) × $0.000002 = $10/month
- Total: $37.74/month (62% savings)
3. Implement Request Batching
Reduce Request Count:
- Instead of 100 individual requests, send 1 batch request with 100 items
- API Gateway charges per request, not per item processed
Example:
❌ Before: 100 requests × $3.50 / 1M = $0.00035
✅ After: 1 batch request × $3.50 / 1M = $0.0000035
Savings: 99% per batch
Caveat: HTTP API meters in 512 KB increments. Large batches may cost more.
4. Set Appropriate Throttling Limits
Prevent Unexpected Costs:
- Throttling limits protect against traffic spikes from DDoS or misbehaving clients
- Without limits, API Gateway scales infinitely (and costs scale linearly)
Example:
- Normal traffic: 1,000 RPS
- DDoS attack: 100,000 RPS for 1 hour
- Without throttling: 360M requests × $3.50 / 1M = $1,260 (unexpected cost)
- With throttling (10,000 RPS): 36M requests × $3.50 / 1M = $126
- Savings: $1,134 (90% reduction)
5. Optimize Data Transfer with Compression
Enable Compression:
- API Gateway supports gzip compression for responses >1 KB
- Client sends
Accept-Encoding: gzipheader - API Gateway compresses response automatically
Savings:
- JSON responses compress 60-80%
- Data transfer: $0.09 per GB
- Example: 100 GB uncompressed → 20 GB compressed = $7.20/month savings
6. Use Regional Endpoints When Possible
Endpoint Types:
- Edge-Optimized (default): Requests routed through CloudFront edge locations (global)
- Regional: Requests go directly to API Gateway in region (no CloudFront)
- Private: Accessible only from VPC (no internet access)
Cost Consideration:
- Edge-Optimized: No additional CloudFront charges for API Gateway traffic
- Regional: Slightly lower latency for clients in same region
- Private: Lowest latency, no data transfer charges within VPC
Use Regional When:
- All clients in same region as API (no benefit from CloudFront)
- Want to use own CloudFront distribution for advanced caching
7. Monitor and Right-Size Cache
Avoid Over-Provisioning:
- Start with 0.5 GB cache ($0.020/hour = $14.60/month)
- Monitor
CacheMissCountdue to cache evictions - Increase cache size only if eviction rate is high (>10%)
Example:
- Provisioned 13.5 GB cache ($0.250/hour = $182.50/month) “just in case”
- Actual usage: 2 GB
- Right-size to 6.1 GB ($0.200/hour = $146/month)
- Savings: $36.50/month (20% reduction)
Common Pitfalls and How to Avoid Them
1. Using HTTP API for Multi-Tenant SaaS Without Per-Client Throttling
Problem: HTTP API lacks per-client throttling. Single abusive tenant can exhaust entire API capacity.
Example: SaaS API with 100 tenants. Tenant A sends 10,000 RPS (misbehaving or DDoS). Other 99 tenants get throttled (429 errors) because account-level limit is 10,000 RPS.
Impact: Service outage for 99% of customers, reputation damage, potential revenue loss.
Solution:
- Use REST API with usage plans for per-client throttling
- Or: Implement custom throttling in Lambda authorizer (query DynamoDB for tenant rate limits)
Cost Impact: REST API costs $3.50/M vs HTTP API $1.00/M, but prevents multi-tenant outages worth $10,000-$100,000+ in lost revenue.
2. Not Setting Method-Level Throttling for Expensive Operations
Problem: All API methods share same throttle limit. Expensive operations (e.g., generate report) can starve cheap operations (e.g., health check).
Example:
- API has 10,000 RPS limit
POST /reports/generatetakes 30 seconds to complete- Clients send 500 RPS to
/reports/generate - 500 concurrent requests × 30 seconds = blocks 15,000 slots (exceeds 10,000 RPS limit)
- Other endpoints get throttled
Impact: API unavailable for all operations due to one expensive endpoint.
Solution:
- Set method-level throttle for
POST /reports/generate: 100 RPS, 200 burst - Reserve capacity for other endpoints
Configuration:
{
"POST /reports/generate": {
"throttle": {
"rateLimit": 100,
"burstLimit": 200
}
}
}
Cost Impact: Prevents API outages costing $1,000-$10,000/hour in lost transactions.
3. Caching User-Specific Data Without Including User ID in Cache Key
Problem: Response cached for User A is returned to User B (data leakage).
Example:
- API endpoint:
GET /users/me(returns current user profile) - Cache key:
/users/me(default, no user ID) - User A requests
/users/me, API Gateway caches response - User B requests
/users/me, API Gateway returns User A’s cached profile
Impact: Critical security vulnerability, GDPR violation, user data exposed.
Solution:
- Disable caching for user-specific endpoints
- Or: Include
Authorizationheader in cache key (each user gets separate cache entry)
Configuration:
{
"GET /users/me": {
"caching": {
"enabled": false
}
}
}
Cost Impact: Data breach fines (GDPR: up to €20M or 4% of global revenue) far exceed any caching cost savings.
4. Not Implementing Exponential Backoff for 429 Errors
Problem: Clients retry immediately after 429 (Too Many Requests), amplifying load and extending throttling.
Example:
- Client sends 15,000 requests in 1 second (exceeds 10,000 RPS limit)
- API Gateway throttles 5,000 requests (429 response)
- Client retries all 5,000 immediately
- API Gateway throttles again (429 response)
- Retry storm continues for minutes
Impact: Extended API outage, backend overwhelmed by retry traffic.
Solution:
- Implement exponential backoff with jitter in client
- Example: First retry after 1s, second after 2s, third after 4s, fourth after 8s (max 60s)
Example Implementation (C#):
int maxRetries = 5;
int baseDelay = 1000; // 1 second
for (int i = 0; i < maxRetries; i++)
{
HttpResponseMessage response = await client.GetAsync("/api/users");
if (response.StatusCode != HttpStatusCode.TooManyRequests)
{
return response; // Success or non-throttling error
}
// Exponential backoff with jitter
int delay = baseDelay * (int)Math.Pow(2, i);
int jitter = Random.Shared.Next(0, 1000); // 0-1000ms jitter
await Task.Delay(delay + jitter);
}
Cost Impact: Reduces retry storm traffic by 80-95%, prevents extended outages.
5. Not Monitoring Cache Hit Ratio
Problem: Paying for cache but hit ratio is low (30-50%), wasting money.
Example:
- Provisioned 6.1 GB cache ($0.200/hour = $146/month)
- Cache hit ratio: 40%
- Most responses not cacheable or TTL too short
Impact: $146/month cache cost with minimal benefit.
Solution:
- Monitor
CacheHitCountandCacheMissCountin CloudWatch - Target >70% hit ratio
- If hit ratio low:
- Increase TTL for cacheable endpoints
- Disable cache for non-cacheable endpoints
- Optimize cache key to reduce cardinality
Cost Impact: Disable underutilized cache saves $146/month. Optimize cache key increases hit ratio from 40% to 80%, doubling backend savings.
6. Using REST API When HTTP API Would Suffice
Problem: Paying 3.5x more for REST API features you don’t use.
Example:
- Simple Lambda-backed API for internal microservices
- No caching, WAF, usage plans, or request transformation
- 50M requests/month
Cost:
- REST API: 50M × $3.50 / 1M = $175/month
- HTTP API: 50M × $1.00 / 1M = $50/month
- Wasted: $125/month (71%)
Solution:
- Audit REST API features in use
- Migrate to HTTP API if only using basic features (IAM, Cognito, Lambda integration)
Migration Path:
- Create HTTP API with same routes
- Update DNS to point to HTTP API
- Monitor for errors
- Delete REST API after validation
Cost Impact: Save $125/month × 12 = $1,500/year for single API.
7. Not Setting TTL=0 for Non-Cacheable Endpoints
Problem: API Gateway caches POST/PUT/DELETE responses, causing stale data.
Example:
POST /orderscreates order and returns order confirmation- Response cached for default 300 seconds
- User creates order, gets cached response from previous order (wrong order ID)
Impact: Users see incorrect data, customer support burden, potential refunds.
Solution:
- Set TTL=0 for non-idempotent methods (POST, PUT, PATCH, DELETE)
- Only cache GET requests
Configuration:
{
"POST /*": {
"caching": {
"ttl": 0
}
}
}
Cost Impact: Free (no cost to disable caching). Prevents customer support issues worth $500-$2,000/incident.
8. Not Using Lambda Authorizer Caching
Problem: Lambda authorizer invoked for every request, adding latency and cost.
Example:
- API receives 10M requests/month
- Lambda authorizer: 128 MB, 100ms
- No caching (TTL=0)
Cost:
- Lambda invocations: 10M × $0.20 / 1M = $2.00
- Lambda duration: 10M × 0.1s × 128 MB / 1024 MB × $0.0000166667 = $2.08
- Total: $4.08/month
With Caching (TTL=300s):
- Unique users: 100,000
- Average requests per user: 100
- Cache hit ratio: 99% (only first request per user invokes Lambda)
- Lambda invocations: 100,000 × $0.20 / 1M = $0.02
- Total: $0.04/month
- Savings: $4.04/month (99% reduction)
Solution:
- Set TTL=300 seconds (5 minutes) for most use cases
- Set TTL=0 only for frequently changing permissions (admin role changes)
Cost Impact: Save 99% on Lambda authorizer costs. Also reduces latency from 100ms to <1ms for cached requests.
9. Not Validating Request Bodies Before Invoking Backend
Problem: Invalid requests reach backend, consuming compute resources and returning generic errors.
Example:
- API endpoint:
POST /users(create user) - Required fields: email, password, name
- Client sends invalid request:
{"email": "not-an-email"} - Backend Lambda invoked, validates request, returns 400 error
- Lambda cost: $0.000002 per invocation
Impact: 10M invalid requests/month × $0.000002 = $20/month wasted on invalid requests. Backend overload during attack scenarios.
Solution (REST API):
- Enable request validation with JSON Schema
- API Gateway returns 400 error before invoking backend
Example Schema:
{
"type": "object",
"required": ["email", "password", "name"],
"properties": {
"email": {"type": "string", "format": "email"},
"password": {"type": "string", "minLength": 8},
"name": {"type": "string", "minLength": 1}
}
}
Cost Impact: Save $20/month on invalid request processing. Prevent backend overload during attacks.
Limitation: HTTP API does not support request validation. Use Lambda for validation or accept invalid requests.
10. Using Edge-Optimized Endpoint for Regional Traffic
Problem: All traffic routed through CloudFront even when clients and API are in same region, adding latency.
Example:
- API deployed in
us-east-1 - All clients in
us-east-1 - Edge-Optimized endpoint routes traffic through CloudFront (adds 5-10ms latency)
Impact: 10ms added latency for every request, no benefit from CloudFront.
Solution:
- Use Regional endpoint for region-specific traffic
- Use Edge-Optimized for global traffic
Latency Comparison:
- Edge-Optimized: 20-30ms (CloudFront + API Gateway)
- Regional: 10-15ms (API Gateway only)
- Improvement: 30-50% latency reduction
Cost Impact: No direct cost savings, but latency improvement worth considering for performance-sensitive applications.
11. Not Using Custom Domain Names
Problem: Clients hardcode default API Gateway URL (abcdef123.execute-api.us-east-1.amazonaws.com). Difficult to migrate or change APIs.
Example:
- Mobile app hardcodes default URL in code
- Want to migrate from REST API to HTTP API (different URL)
- Must release new mobile app version, wait for user adoption (weeks to months)
Impact: Vendor lock-in, slow migration, dual-run APIs during transition (2x cost).
Solution:
- Use custom domain name from day 1 (api.example.com)
- Change backend API without changing client code
- Instant cutover via DNS
Cost:
- Custom domain: Free (API Gateway)
- ACM certificate: Free
- Route 53 hosted zone: $0.50/month
Cost Impact: $0.50/month vs. dual-running APIs for 3 months ($300-$1,000 extra costs).
12. Not Enabling CloudWatch Logs for Debugging
Problem: API errors occur, no logs to debug. Blind troubleshooting wastes engineering time.
Example:
- API returns 500 errors for some requests
- No CloudWatch logs enabled
- Must reproduce issue locally, add logging, redeploy (2-4 hour debugging cycle)
Impact: 2-4 hours engineering time ($200-$800 at $100/hour).
Solution:
- Enable CloudWatch Logs for API Gateway (INFO or ERROR level)
- Logs include request/response, integration latency, Lambda invocation details
Cost:
- CloudWatch Logs: $0.50 per GB ingested
- Typical API: 1-5 GB/month = $0.50-$2.50/month
Cost Impact: $2.50/month logging vs. $200-$800 debugging time per incident. ROI: 100x+
13. Not Using X-Ray for Distributed Tracing
Problem: Difficult to identify latency bottlenecks in multi-service architectures.
Example:
- API Gateway → Lambda → DynamoDB → S3
- Total latency: 500ms
- Don’t know which service is slow
Impact: Over-provision all services (2x cost) to ensure fast responses.
Solution:
- Enable X-Ray tracing on API Gateway, Lambda, DynamoDB
- Visualize service map, identify slow service (e.g., DynamoDB 400ms, others <50ms)
- Optimize DynamoDB (add index, increase capacity)
Cost:
- X-Ray: $5 per 1 million traces recorded, $0.50 per 1 million traces retrieved
- Example: 10M requests/month = $50 (recording) + $5 (retrieval) = $55/month
Cost Impact: $55/month tracing vs. $500/month over-provisioning. Savings: $445/month
14. Not Implementing CORS Correctly
Problem: Browser blocks API requests due to missing CORS headers. Front-end developers blame API.
Example:
- Web app at
https://app.example.comcalls API athttps://api.example.com - API doesn’t return CORS headers
- Browser blocks request with CORS error
Impact: API appears broken to front-end, 1-2 day debugging cycle.
Solution (HTTP API - Simple):
{
"cors": {
"allowOrigins": ["https://app.example.com"],
"allowMethods": ["GET", "POST", "PUT", "DELETE"],
"allowHeaders": ["Content-Type", "Authorization"],
"maxAge": 86400
}
}
Solution (REST API - Manual):
- Enable CORS on each method
- API Gateway adds OPTIONS method automatically
- Returns
Access-Control-Allow-Origin,Access-Control-Allow-Methods, etc.
Cost Impact: Free. Prevents 1-2 day debugging cycles worth $800-$1,600 engineering time.
15. Not Setting Appropriate API Gateway Timeout
Problem: Default timeout is 29 seconds (max). Long-running operations time out, client receives 504 Gateway Timeout.
Example:
- API endpoint generates PDF report (takes 45 seconds)
- API Gateway times out after 29 seconds
- Client receives 504 error, report generation continues in background (wasted compute)
Impact: User sees error, retries, multiple reports generated, wasted backend resources.
Solution:
- Use asynchronous pattern for long-running operations
- API returns 202 Accepted with job ID immediately
- Client polls for completion
- Backend uses Step Functions, SQS, or Lambda async invocation
Example:
1. POST /reports → 202 Accepted {"jobId": "abc123"}
2. GET /reports/abc123 → 200 OK {"status": "in_progress"}
3. GET /reports/abc123 → 200 OK {"status": "completed", "url": "s3://..."}
Cost Impact: Eliminates duplicate processing from retries. Improves user experience (no timeout errors).
Key Takeaways
API Type Selection:
- Use HTTP API for 71% cost savings and 60% lower latency when you don’t need advanced features
- Use REST API for per-client throttling (multi-tenant SaaS), response caching, AWS WAF, request validation, or private endpoints
- Critical: HTTP API lacks per-client throttling, making it unsuitable for production multi-tenant SaaS without custom throttling logic
Authorization Strategy:
- IAM: Internal AWS service-to-service communication (free)
- Cognito User Pools: Web/mobile apps with user authentication ($0.0055 per MAU after 50K free)
- JWT Authorizers (HTTP API): Bring your own identity provider (free, fast)
- Lambda Authorizers: Custom logic, any authentication mechanism (caching critical for cost/latency)
Throttling Best Practices:
- Set account-level limits to prevent unexpected costs during DDoS
- Use method-level throttling for expensive operations
- Use per-client throttling (usage plans) for multi-tenant SaaS
- Implement exponential backoff with jitter in clients
Caching Strategy:
- Enable caching when backend cost > cache cost and cache hit ratio >70%
- Start with 0.5 GB cache ($14.60/month), monitor eviction rate, right-size
- Set TTL=0 for POST/PUT/DELETE, TTL=300-3600 for GET
- Never cache user-specific data without including user ID in cache key
Cost Optimization:
- HTTP API saves 71% vs REST API ($1.00 vs $3.50 per million requests)
- Caching reduces backend invocations by 70-90% (saves Lambda costs)
- Compression reduces data transfer by 60-80% ($0.09 per GB)
- Throttling prevents DDoS-induced cost spikes (can save $1,000+ per incident)
- Request validation eliminates invalid request processing costs
Integration Patterns:
- Lambda Proxy Integration: Most common, pass entire request to Lambda
- HTTP Endpoint Integration: Front existing APIs with API Gateway
- AWS Service Integration: Direct DynamoDB/S3/SQS integration without Lambda
- VPC Link: Expose private VPC resources via API Gateway
Monitoring and Debugging:
- Enable CloudWatch Logs (INFO level) for debugging ($0.50-$2.50/month)
- Enable X-Ray tracing for distributed systems ($55/month saves $445 in over-provisioning)
- Monitor cache hit ratio, throttle rates, latency, error rates
- Set alarms for 4XX/5XX error rates, latency spikes
When NOT to Use API Gateway:
- Very low traffic (<1000 requests/day) where $3.50-$100/month is significant
- Internal microservices with service mesh (use Envoy/Istio instead)
- WebSocket-only applications (use AppSync or ALB WebSocket support)
- Streaming or long-polling (API Gateway has 29s timeout limit)
Found this guide helpful? Share it with your team:
Share on LinkedIn