AWS Container Services: ECS, EKS, and Fargate

AWS

What Problems Containers Solve
Amazon ECS (Elastic Container Service)
AWS Fargate
Amazon EKS (Elastic Kubernetes Service)
Container Networking
Storage for Containers
Security Best Practices
Observability
Deployment Patterns
Cost Optimization
Service Selection Framework
Common Pitfalls
Key Takeaways

What Problems Containers Solve

Containers address several key challenges compared to traditional EC2 instances and serverless Lambda functions.

vs. EC2 Instances

Consistency across environments: Containers package applications with all dependencies, eliminating “works on my machine” problems. The same container runs identically in development, staging, and production.

Resource efficiency: Multiple containers share the host OS kernel, using less memory and CPU than separate virtual machines. A single EC2 instance can run dozens of containers.

Faster deployment: Container startup takes seconds vs. minutes for EC2 instances. Immutable container images enable rapid rollbacks.

Portability: Containers abstract infrastructure details. Move workloads between ECS, EKS, on-premises, and other cloud providers without code changes.

vs. Lambda

Longer execution times: Lambda has a 15-minute maximum timeout. Containers run indefinitely for long-running services.

Language/framework flexibility: Lambda supports specific runtimes. Containers support any language or framework that runs on Linux/Windows.

State management: Containers handle stateful applications and long-running services better than Lambda’s ephemeral execution model.

Larger package sizes: Lambda limits deployment packages to 250 MB unzipped. Container images can be several GB.

When to Use Containers

Choose containers (ECS/EKS) when:

Running long-running applications (web servers, microservices, APIs)
Need specific languages/frameworks not well-supported by Lambda
Application runs longer than 15 minutes
Require full control over runtime environment
Migrating existing Docker-based applications
Running stateful applications (databases, message queues)

Choose Lambda when:

Event-driven workloads (S3 uploads, API Gateway requests)
Execution time under 15 minutes
Sporadic or variable traffic patterns
Want zero infrastructure management

Choose EC2 when:

Need full infrastructure control (custom OS, kernel modules)
Running specialized databases or data stores
GPU-intensive workloads requiring specific drivers
Migrating legacy applications requiring specific server configurations

Amazon ECS (Elastic Container Service)

ECS Architecture

ECS is AWS’s native container orchestration service, designed for simplicity and deep AWS integration.

Core components:

Cluster: Logical grouping of services and tasks. Region-specific. Can contain EC2 instances, Fargate capacity, or both.

Task Definition: JSON blueprint describing 1-10 containers that comprise your application. Specifies container images, CPU/memory requirements, port mappings, network modes, IAM roles, environment variables, and secrets.

Task: Instantiation of a task definition—smallest unit of execution in ECS. Can contain one or more containers running together on the same host.

Service: Manages desired number of tasks, ensuring they keep running. Handles scheduling, load balancer integration, auto scaling, and rolling deployments. Use services for long-running applications.

Launch Types: EC2 vs. Fargate

ECS on EC2:

Control: Full control over instance types, OS, networking, storage
Cost: Most cost-effective for steady-state workloads (up to 3x cheaper than Fargate with Reserved Instances)
Responsibility: You manage patching, scaling, capacity planning
Use case: High-volume production workloads with predictable traffic

ECS on Fargate:

Management: AWS handles all infrastructure—no servers to manage
Cost: Pay only for vCPU and memory used by tasks (per-second billing)
Pricing: $0.04048 per vCPU-hour, $0.004492 per GB-memory-hour (us-east-1, 2024)
Use case: Variable workloads, dev/test environments, microservices

ECS Managed Instances (2024 Recommendation):

AWS fully manages EC2 instances (provisioning, patching, scaling)
Best combination of performance, cost optimization, and operational simplicity
Recommended for new workloads requiring EC2 launch type

Service Discovery and Load Balancing

AWS Cloud Map Service Discovery:

Defines custom DNS names for services
Maintains updated locations of dynamically changing resources
DNS-based discovery with configurable TTL
Simpler but slower failover (depends on DNS TTL)

ECS Service Connect (2024 Feature):

Built on Cloud Map with Envoy-based sidecar proxy
API-based discovery (faster than DNS)
Automatic failover detection and traffic routing
Built-in observability (logs, metrics)
Limitation: Cannot use CodeDeploy (no blue/green deployments with CodeDeploy)
Cost: Additional resources for sidecar containers

Load Balancer Integration:

Application Load Balancer (ALB): Best for HTTP/HTTPS, advanced routing (path-based, host-based, header-based)
Network Load Balancer (NLB): Best for TCP/UDP, ultra-low latency, high throughput, static IPs
awsvpc network mode: Use “ip” target type when tasks have elastic network interfaces

Auto Scaling

Target Tracking Scaling (Recommended):

Set target value for a metric (e.g., 70% CPU utilization)
ECS automatically creates CloudWatch alarms and adjusts task count
Metrics: CPU utilization, memory utilization, ALB request count per target

Step Scaling:

Define specific thresholds and scaling actions
React quickly to demand spikes
Multiple steps for different alarm severity
Example: Add 2 tasks at 70% CPU, add 5 tasks at 85% CPU

Predictive Scaling (November 2024):

Uses machine learning to analyze historical patterns
Scales proactively before demand spikes
Combines with target tracking for real-time adjustments

When to Use ECS vs. EKS

Choose ECS when:

Team has little/no Kubernetes experience
Deploying AWS-centric workloads
Want minimal operational overhead
Running simple to moderate complexity microservices
Prioritizing ease of use and deep AWS integration
Cost efficiency critical (no control plane costs)

Choose EKS when:

Team already has Kubernetes expertise
Need multi-cloud or hybrid deployments
Require fine-grained control over orchestration
Want access to Kubernetes ecosystem (Helm, operators, CNCF tools)
Portability is important
Complex distributed systems requiring advanced orchestration

Not a binary decision: Both services can coexist in the same AWS account. Containers ensure portability between them.

AWS Fargate

Serverless Container Execution

Fargate is a serverless compute engine that runs containers without managing servers. You define CPU, memory, and networking requirements; AWS handles provisioning, scaling, and patching.

Fargate vs. EC2 Launch Type

Factor	Fargate	ECS on EC2
Management	Zero infrastructure management	Manage instances, patching, capacity
Cost (steady-state)	3-9x more than EC2 Reserved Instances	Most cost-effective with Reserved Instances
Cost (variable)	Pay-per-second for task duration	Pay for instances even if underutilized
Startup time	~30-60 seconds	Seconds (if instances already running)
Control	Limited (AWS-managed)	Full control over instances
Best for	Variable workloads, batch jobs, dev/test	High-volume production, GPU workloads

Real-world cost comparison:

Fargate: $0.04048/vCPU-hour + $0.004492/GB-hour (us-east-1)
EC2 on-demand: ~3x cheaper
EC2 Reserved Instances (1-year): ~6x cheaper than Fargate
EC2 Reserved Instances (3-year): ~9x cheaper than Fargate

Fargate Pricing and Resource Configurations

Pricing (2024):

vCPU: $0.04048 per vCPU-hour (us-east-1)
Memory: $0.004492 per GB-hour
Storage: 20 GB ephemeral storage included; $0.000111 per GB-hour for additional storage (up to 200 GB)
Billing: Per-second billing with 1-minute minimum

Valid CPU/Memory Configurations:

0.25 vCPU: 0.5 GB, 1 GB, 2 GB memory
0.5 vCPU: 1 GB to 4 GB (increments of 1 GB)
1 vCPU: 2 GB to 8 GB
2 vCPU: 4 GB to 16 GB
4 vCPU: 8 GB to 30 GB
8 vCPU: 16 GB to 60 GB
16 vCPU: 32 GB to 120 GB

Fargate Spot

Discount: Up to 70% off Fargate on-demand pricing

Interruption: 2-minute warning before termination when AWS needs capacity back

Availability: Capacity not guaranteed

Use cases:

Fault-tolerant workloads
Batch processing
Stateless services with built-in resilience
CI/CD pipelines

Combining optimizations:

Graviton + Spot: Up to 76% savings (20% from Graviton, 70% from Spot compounded)
Graviton pricing: ~20% cheaper than x86 (e.g., eu-west-1: $0.03238 vs. $0.04048 per vCPU-hour)
Graviton + Spot announcement: September 2024—Fargate Spot now supports Arm-based Graviton processors

Amazon EKS (Elastic Kubernetes Service)

Kubernetes Fundamentals on AWS

EKS runs upstream Kubernetes, ensuring compatibility with standard Kubernetes tooling and APIs. AWS manages the Kubernetes control plane (high availability, patching, upgrades) across multiple Availability Zones.

EKS Architecture

Control Plane (AWS-Managed):

Kubernetes API server, etcd, scheduler, controller manager
Automatically scaled and distributed across 3 AZs
AWS handles patching, upgrades, high availability
Cost: $0.10 per cluster-hour (~$73/month per cluster)

Worker Nodes (Customer-Managed Options):

Self-Managed Nodes:

Full control over EC2 instances
Manual scaling, patching, upgrades
Most flexible but highest operational burden

Managed Node Groups (Recommended):

AWS handles provisioning, scaling, patching
No extra cost (only pay for EC2 instances)
Automated updates with single operation
Integrates with Auto Scaling Groups

Fargate:

Serverless compute for pods
No node management
Per-pod pricing
Limited features (no DaemonSets, no hostPort)

Karpenter (Advanced):

Group-less autoscaling—works directly with EC2 Fleet API
Responds to workload demands in under 1 minute
Optimizes instance selection based on pod requirements
More flexible and faster than Cluster Autoscaler

EKS Auto Mode (December 2024):

Fully automates Kubernetes cluster management
Handles compute, storage, networking with a single click
Built on Karpenter
One-click migration from Managed Node Groups or Fargate

EKS vs. ECS Decision Framework

Factor	ECS	EKS
Learning curve	Low	High (requires Kubernetes knowledge)
Operational complexity	Minimal	Moderate to high
Control plane cost	Free	$73/month per cluster
AWS integration	Deep (native AWS service)	Standard Kubernetes integration
Ecosystem	Limited	Rich (Helm, operators, CNCF tools)
Portability	AWS-only	Multi-cloud, hybrid, on-premises
Use case	AWS-centric microservices	Complex distributed systems, multi-cloud

When Kubernetes Complexity is Justified

Use EKS when:

Existing Kubernetes expertise in the team
Need for multi-cloud or hybrid deployments (EKS Hybrid Nodes GA December 2024)
Require advanced orchestration features (custom controllers, operators, StatefulSets)
Want vibrant ecosystem and community support (Helm, Prometheus, Istio)
Migrating from on-premises Kubernetes
Running complex stateful applications requiring persistent volumes

Example: A company running 100+ microservices with complex service mesh requirements, custom operators, and plans to migrate workloads to on-premises data centers benefits from EKS. A startup deploying 5 microservices exclusively on AWS is better served by ECS.

Container Networking

VPC Networking Modes

awsvpc (Recommended):

Each task/pod gets its own elastic network interface (ENI)
Tasks have their own private IP address
Full VPC networking features (security groups, NACLs)
Required for Fargate
Limitation: ENI limits per instance (e.g., m5.large supports 10 ENIs)

bridge (Docker Default):

Uses Docker’s virtual network bridge
Port mapping required (host port → container port)
Reduced security isolation
Not available on Fargate

host:

Container uses host’s network directly
No port mapping needed
Least isolation
Not available on Fargate

Service Mesh

ECS Service Connect (2024):

Managed Envoy sidecar
Faster failover than DNS-based service discovery
Observability built-in (CloudWatch Logs, metrics)
Only for ECS-to-ECS communication
Incompatible with blue/green deployments using CodeDeploy

Amazon VPC Lattice (2024 General Service Mesh):

Eliminates sidecar proxies
Works across ECS, EKS, Lambda, EC2
Simplified application networking with consistent connectivity, security, and monitoring
Preferred for cross-service communication

AWS App Mesh (Legacy):

New onboarding stopped September 2024
Migration paths: ECS Service Connect (ECS) or VPC Lattice (general)

Load Balancing

Application Load Balancer (ALB):

Layer 7 (HTTP/HTTPS)
Advanced routing (path-based, host-based, query string, header-based)
WebSocket support
SSL/TLS termination
Native integration with ECS/EKS

Network Load Balancer (NLB):

Layer 4 (TCP/UDP)
Ultra-low latency (microseconds)
Static IP addresses
Millions of requests per second
Preserves source IP

Multiple Target Groups (2024):

ECS services can attach to multiple target groups
Example: Internal NLB for private traffic + internet-facing ALB for public traffic

Storage for Containers

Ephemeral Storage

ECS on Fargate:

Default: 20 GB included (free)
Maximum: 200 GB configurable
Cost: $0.000111 per GB-hour for additional storage
Encryption: AES-256 for tasks launched on platform version 1.4.0+ (May 28, 2020 or later)

EKS on Fargate:

Default: 20 GB
Maximum: 175 GB per pod

ECS on EC2:

Depends on instance storage (typically 10-30 GB root volume)

EBS Volumes for ECS Tasks

Major 2024 Update: ECS now supports native EBS volume integration (announced January 2024).

Use cases:

Data-intensive workloads requiring high performance, low latency
Block storage within a single Availability Zone
Applications needing persistent storage that doesn’t span tasks

Availability: US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Singapore, Sydney, Tokyo), Europe (Frankfurt, Ireland, Stockholm)

EFS for Shared Persistent Storage

Amazon EFS (Elastic File System):

Use case: Applications spanning many tasks needing concurrent access
Availability: Multi-AZ Regional availability
Access modes: ReadWriteMany (multiple pods/tasks can mount simultaneously)
Supported on: ECS (EC2 and Fargate), EKS

EBS vs. EFS:

EBS: Single-AZ, ReadWriteOnce, lower latency, higher IOPS
EFS: Multi-AZ, ReadWriteMany, regional availability, shared access

Best practice for EKS: Deploy Amazon EBS CSI driver or Amazon EFS CSI driver via EKS add-ons for security and efficiency.

Security Best Practices

IAM Roles for Tasks and Pods

ECS Task Roles:

Task Execution Role: Grants ECS agent permission to pull images from ECR and write logs to CloudWatch
Task IAM Role: Grants application code access to AWS services
Best practice: Separate roles for execution vs. application; apply least privilege

EKS IRSA (IAM Roles for Service Accounts):

Assigns IAM roles to Kubernetes service accounts
Pod-level permissions without sharing credentials
Leverages AWS STS for temporary credentials (auto-rotated)
2024 Update: Continues to be supported alongside EKS Pod Identity

EKS Pod Identity (2023+):

Assigns IAM roles directly to pods (decoupled from service accounts)
Simpler trust management than IRSA
More fine-grained control

Security warning: Pods can still inherit instance profile permissions. Always block access to instance metadata when using IRSA or Pod Identity.

Secrets Management

AWS Secrets Manager and Parameter Store:

Store sensitive data (database passwords, API keys)
Reference in task definitions via ARN
Required permission: secretsmanager:GetSecretValue in task execution role

Example (ECS Task Definition):

"secrets": [
  {
    "name": "DB_PASSWORD",
    "valueFrom": "arn:aws:secretsmanager:region:account:secret:db-password"
  }
]

Best practices:

Never hardcode secrets in container images or environment variables
Use Secrets Manager for secrets requiring rotation
Use Parameter Store (SecureString) for static configuration
Grant minimal IAM permissions for secret access

Image Scanning

Amazon Inspector (ECR Integration):

Automatically scans images on push
Detects vulnerabilities in OS packages and application dependencies
Maps images to running containers (ECS tasks, EKS pods)
Prioritizes vulnerabilities based on whether images are currently running

Best practices:

Enable image tag immutability to prevent malicious overwrites
Use EventBridge to trigger actions (delete insecure images, trigger rebuilds)
Scan on every push
Block deployment of images with critical vulnerabilities

Network Security

Security Groups:

awsvpc mode: Assign security groups directly to tasks/pods
Control inbound/outbound traffic at task level
Stateful (return traffic automatically allowed)

NACLs (Network Access Control Lists):

Subnet-level firewall rules
Stateless (must configure inbound and outbound separately)
Defense-in-depth layer

GuardDuty Runtime Monitoring (2023):

Detects runtime security threats in ECS (EC2 and Fargate) and EKS
Identifies suspicious activity, malware, unauthorized access

Runtime Security

Pod Security Standards (EKS):

Kubernetes-native security policies (Restricted, Baseline, Privileged)
Enforce via admission controllers (OPA Gatekeeper, Kyverno)
Limit privileged containers (needed for system components like VPC CNI, but not application pods)

Container-Optimized OS:

Bottlerocket: AWS-managed, immutable, minimal attack surface
Automatically patched via managed node groups

CIS Benchmark Compliance:

Verify EKS/ECS configurations against CIS benchmarks
Tools: AWS Security Hub, third-party scanners

Observability

CloudWatch Container Insights

Features:

Collects, aggregates, and summarizes metrics and logs
Instance-level, cluster-level, and task/pod-level metrics
Pre-built dashboards (CPU, memory, network, disk)

Enhanced ECS Observability (December 2024):

Granular visibility into container workloads
Proactive monitoring and faster troubleshooting

Requirements:

ECS on EC2: Container agent 1.4.0+ (latest recommended)
EKS: Deploy CloudWatch agent via DaemonSet or Fargate logging

Logging

awslogs Log Driver:

Forwards stdout/stderr to CloudWatch Logs
Simple configuration in task definition

"logConfiguration": {
  "logDriver": "awslogs",
  "options": {
    "awslogs-group": "/ecs/my-app",
    "awslogs-region": "us-east-1",
    "awslogs-stream-prefix": "ecs"
  }
}

FireLens (Fluent Bit/Fluentd):

Routes logs to third-party services (Datadog, Splunk, Elasticsearch)
Flexible log transformation and routing
Sidecar container pattern

Distributed Tracing

AWS X-Ray:

Traces requests across microservices
Identifies performance bottlenecks, errors
Integrates with ECS and EKS via sidecar container or daemon

ADOT (AWS Distro for OpenTelemetry):

Collects traces and metrics using OpenTelemetry
Sends data to CloudWatch, X-Ray, Prometheus
Vendor-neutral instrumentation

Prometheus and Grafana (EKS)

Amazon Managed Service for Prometheus:

Fully managed Prometheus-compatible monitoring
Agentless metric collection for EKS (2023)
Integrates with Grafana for visualization

Amazon Managed Grafana:

Fully managed Grafana for dashboards
Pre-built dashboards for EKS, ECS

Deployment Patterns

Blue/Green Deployments

ECS Native Blue/Green (2025):

Built-in blue/green without CodeDeploy
Can change deployment controller after service creation
Requires ALB
Validates new revision before routing production traffic
Instant rollback capability

EKS Blue/Green:

Use separate Kubernetes deployments or namespaces
Shift traffic via service selector or ingress controller
Tools: Flagger, Argo Rollouts

Rolling Updates

ECS Rolling Update:

Default deployment type
Gradually replaces tasks with new version
Configurable: minimumHealthyPercent and maximumPercent
Example: 50% minimum, 200% maximum = deploy new tasks before stopping old ones

EKS Rolling Update:

Kubernetes-native via Deployment resources
Configurable: maxUnavailable, maxSurge

Canary Deployments

ECS Native Canary (October 2025):

Route small percentage of traffic to new revision
Monitor metrics during bake time
Gradually increase traffic
Automatic rollback on CloudWatch alarm breach

EKS Canary:

Use Flagger (progressive delivery tool)
Argo Rollouts (GitOps-based canary)

CircuitBreaker Deployment (ECS)

Deployment Circuit Breaker:

Monitors deployment health
Stops launching new tasks if service cannot reach steady state
Optionally rolls back to last successful deployment
Only works with rolling update deployment type

"deploymentConfiguration": {
  "deploymentCircuitBreaker": {
    "enable": true,
    "rollback": true
  }
}

Cost Optimization

EC2 vs. Fargate Cost Comparison

Scenario: Running 10 tasks, 1 vCPU, 2 GB memory each, 24/7

Launch Type	Configuration	Monthly Cost	Annual Cost
Fargate On-Demand	10 vCPU, 20 GB	$356	$4,272
Fargate Spot	10 vCPU, 20 GB	$107 (70% savings)	$1,284
Fargate Graviton	10 vCPU, 20 GB	$285 (20% savings)	$3,420
Fargate Graviton + Spot	10 vCPU, 20 GB	$86 (76% savings)	$1,032
EC2 Reserved (1-year)	m5.large × 5 instances	$284 (35% savings vs On-Demand)	$3,408
EC2 Reserved (3-year)	m5.large × 5 instances	$189 (57% savings vs On-Demand)	$2,268

Key takeaways:

Fargate Spot + Graviton: Most cost-effective for fault-tolerant workloads ($86/month)
EC2 Reserved (3-year): Best for steady-state, long-term workloads ($189/month)
Fargate On-Demand: Most expensive but simplest ($356/month)

Savings Plans

Compute Savings Plans:

1-year: Up to 50% savings
3-year: Up to 66% savings
Applies across EC2, Fargate, Lambda
Flexible across instance families, sizes, regions

Best practice: Use Compute Savings Plans for baseline capacity, Spot for fault-tolerant workloads, On-Demand for unpredictable spikes.

Right-Sizing Containers

AWS Compute Optimizer:

Uses machine learning to analyze utilization
Recommends optimal CPU and memory configurations
Customizable thresholds (CPU headroom, memory headroom)
Lookback periods: 14, 32, or 93 days

Best practices:

Monitor for 30 days to establish baseline
Rightsize if max memory utilization < 40% over 4 weeks
Use CloudWatch Container Insights for granular metrics
EKS: Use Vertical Pod Autoscaler (VPA) for automated rightsizing

Graviton Processors

AWS Graviton2/Graviton3:

~20% lower cost than x86 (Intel/AMD)
Better performance per dollar
Supported by most popular software packages

Migration:

Rebuild container images for ARM64 architecture
Test compatibility (most modern software supports ARM)
Potential effort: Moderate (rebuilding images, testing)

Fargate Graviton + Spot (September 2024):

Combine 20% Graviton savings with 70% Spot discount
Total: Up to 76% savings vs. Fargate On-Demand

Service Selection Framework

Decision Matrix

Use Case	Recommended Service	Rationale
Simple microservices, AWS-centric	ECS on Fargate	Minimal management, deep AWS integration
High-volume production, cost-critical	ECS on EC2 with Reserved Instances	Most cost-effective for steady-state
Kubernetes ecosystem required	EKS with Managed Node Groups	Standard Kubernetes, rich tooling
Multi-cloud, hybrid deployments	EKS with Hybrid Nodes	Portability, unified management
Variable workloads, dev/test	ECS on Fargate	Pay only for usage, no idle costs
Batch processing, fault-tolerant	Fargate Spot or EC2 Spot	Up to 70-90% cost savings
GPU workloads, custom kernels	ECS on EC2 or EKS on EC2	Full control over instances

Specific Scenarios

E-commerce platform (200 tasks running 24/7):

Recommendation: ECS on EC2 with Reserved Instances
Rationale: Steady-state workload; EC2 Reserved (3-year) saves ~$20,000/year vs. Fargate

Startup with 10 microservices, unpredictable traffic:

Recommendation: ECS on Fargate
Rationale: No capacity planning, automatic scaling, pay only for usage

Financial services (300+ microservices, multi-cloud strategy):

Recommendation: EKS
Rationale: Kubernetes provides consistent experience across AWS, Azure, on-premises

Data processing pipeline (batch jobs):

Recommendation: Fargate Spot
Rationale: 70% cost savings, fault-tolerant workloads

Common Pitfalls

Over-Provisioning Resources

Problem: Allocating too much CPU/memory wastes money.

Example: Task configured with 2 vCPU but only using 0.5 vCPU wastes $0.03/hour ($22/month per task).

Solution:

Use AWS Compute Optimizer for rightsizing recommendations
Monitor actual utilization for 30 days
Start conservative, scale up as needed

Not Using Fargate Spot

Problem: Running fault-tolerant workloads on Fargate On-Demand pays 3x more than necessary.

Solution:

Identify workloads that tolerate interruptions (batch jobs, CI/CD, stateless services)
Use Fargate Spot for up to 70% savings
Implement retry logic for interrupted tasks

Gotcha: Fargate Spot capacity not guaranteed—have fallback to On-Demand if Spot unavailable.

Improper Health Checks

Problem: Missing or misconfigured health checks cause endless restart loops.

Common issues:

Health check command not included in container image
Timeout too short (check executing longer than timeout allows)
Retry count too low (transient failures mark container unhealthy)

Best practices:

Test health check commands locally
Set interval to 30 seconds, timeout to 5 seconds, retries to 3
Use /health or /healthz endpoints for HTTP-based checks

"healthCheck": {
  "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
  "interval": 30,
  "timeout": 5,
  "retries": 3,
  "startPeriod": 60
}

Missing Auto-Scaling Configuration

Problem: Services cannot handle traffic spikes or waste resources during low traffic.

Example: Service configured with 10 tasks constantly, but traffic varies 5x throughout the day. Auto-scaling (min 2, max 20) saves $150/month.

Solution:

Configure target tracking scaling (70% CPU utilization)
Set reasonable min/max task counts
Test scaling behavior under load

Kubernetes Over-Engineering

Problem: Choosing EKS for simple workloads adds unnecessary complexity and cost.

Costs of EKS:

Control plane: $73/month per cluster
Operational burden: Managing Kubernetes manifests, namespaces, RBAC, CRDs
Learning curve: Requires Kubernetes expertise

When to avoid EKS:

Team has no Kubernetes experience
Running simple microservices (fewer than 20 services)
No need for Kubernetes ecosystem
AWS-only deployment

Example: Team of 3 developers deploying 5 microservices chose EKS because “Kubernetes is the industry standard.” Spent 6 months learning Kubernetes, fighting YAML configuration errors, debugging networking issues. ECS would have taken 1 week to set up.

Not Blocking Instance Metadata Access

Problem: Pods/tasks inherit instance profile permissions, violating least privilege.

Solution:

Use IRSA (EKS) or Task IAM Roles (ECS) for fine-grained permissions
Block IMDS access via network policy or firewall rules
ECS: Set "disableNetworking": true for task definition
EKS: Use network policies to block 169.254.169.254

Using Untagged Images

Problem: Untagged images accumulate, wasting ECR storage costs.

Solution:

Implement ECR lifecycle policies
Expire untagged images after 30 days
Keep last 30 tagged images per repository

{
  "rules": [
    {
      "rulePriority": 1,
      "description": "Expire untagged images after 30 days",
      "selection": {
        "tagStatus": "untagged",
        "countType": "sinceImagePushed",
        "countUnit": "days",
        "countNumber": 30
      },
      "action": {
        "type": "expire"
      }
    }
  ]
}

Key Takeaways

1. Choose the right service based on expertise and requirements: ECS for simplicity and AWS integration ($0 control plane cost). EKS for Kubernetes ecosystem and portability ($73/month per cluster). Let team expertise and portability needs guide the decision.

2. Fargate vs EC2 depends on workload patterns: Fargate excels at variable workloads and eliminates infrastructure management (pay-per-second). EC2 with Reserved Instances is 3-9x cheaper for steady-state workloads. Use Fargate Spot + Graviton for up to 76% savings on fault-tolerant workloads.

3. Container networking matters for security and performance: Use awsvpc mode for task-level security groups (required for Fargate). Use ECS Service Connect or VPC Lattice for service-to-service communication. ALB for HTTP/HTTPS, NLB for TCP/UDP ultra-low latency.

4. Storage depends on access patterns and availability needs: Ephemeral storage (20-200 GB) for temporary data. EBS for high-performance single-AZ persistent storage. EFS for shared multi-AZ persistent storage accessible by multiple tasks.

5. Security requires multiple layers: Use IAM roles for tasks/pods (not instance profiles). Store secrets in Secrets Manager or Parameter Store (not environment variables). Enable ECR image scanning with Amazon Inspector. Block instance metadata access. Use GuardDuty Runtime Monitoring for threat detection.

6. Observability is critical for troubleshooting: Enable CloudWatch Container Insights for metrics. Use awslogs or FireLens for centralized logging. Use X-Ray or ADOT for distributed tracing. For EKS, integrate Amazon Managed Prometheus and Grafana.

7. Deployment patterns enable zero-downtime releases: Use ECS native blue/green or canary deployments (2025 features). Enable CircuitBreaker for automatic rollback on failures. Configure rolling updates with appropriate minimumHealthyPercent and maximumPercent.

8. Cost optimization requires multiple strategies: Use Compute Savings Plans (up to 66% savings) for baseline capacity. Use Fargate Spot (70% savings) or EC2 Spot (90% savings) for fault-tolerant workloads. Use Graviton processors (20% cheaper). Rightsize containers with Compute Optimizer. Implement ECR lifecycle policies.

9. Auto-scaling prevents both under-provisioning and waste: Use target tracking scaling (70% CPU recommended). Enable Predictive Scaling (November 2024) for machine learning-based forecasting. Test scaling behavior under load. Set reasonable min/max task counts.

10. Avoid common pitfalls: Don’t over-provision resources (use Compute Optimizer). Don’t ignore Fargate Spot for fault-tolerant workloads (70% savings). Don’t use shallow health checks (verify application health, not just instance responsiveness). Don’t choose EKS for simple workloads when ECS suffices. Don’t forget to block instance metadata access when using task/pod IAM roles.

Recent 2024-2025 improvements: ECS Managed Instances (recommended for new workloads). EKS Auto Mode (December 2024). ECS native blue/green and canary deployments (October 2025). Fargate Spot with Graviton support (September 2024). Enhanced ECS Observability (December 2024). EBS volume support for ECS tasks (January 2024). VPC Lattice general availability (cross-service networking).

Table of Contents

What Problems Containers Solve

vs. EC2 Instances

vs. Lambda

When to Use Containers

Amazon ECS (Elastic Container Service)

ECS Architecture

Launch Types: EC2 vs. Fargate

Service Discovery and Load Balancing

Auto Scaling

When to Use ECS vs. EKS

AWS Fargate

Serverless Container Execution

Fargate vs. EC2 Launch Type

Fargate Pricing and Resource Configurations

Fargate Spot

Amazon EKS (Elastic Kubernetes Service)

Kubernetes Fundamentals on AWS

EKS Architecture

EKS vs. ECS Decision Framework

When Kubernetes Complexity is Justified

Container Networking

VPC Networking Modes

Service Mesh

Load Balancing

Storage for Containers

Ephemeral Storage

EBS Volumes for ECS Tasks

EFS for Shared Persistent Storage

Security Best Practices

IAM Roles for Tasks and Pods

Secrets Management

Image Scanning

Network Security

Runtime Security

Observability

CloudWatch Container Insights

Logging

Distributed Tracing

Prometheus and Grafana (EKS)

Deployment Patterns

Blue/Green Deployments

Rolling Updates

Canary Deployments

CircuitBreaker Deployment (ECS)

Cost Optimization

EC2 vs. Fargate Cost Comparison

Savings Plans

Right-Sizing Containers

Graviton Processors

Service Selection Framework

Decision Matrix

Specific Scenarios

Common Pitfalls

Over-Provisioning Resources

Not Using Fargate Spot

Improper Health Checks

Missing Auto-Scaling Configuration

Kubernetes Over-Engineering

Not Blocking Instance Metadata Access

Using Untagged Images

Key Takeaways