AWS ECR & Container Security for System Architects
Table of Contents
- What Problems ECR & Container Security Solve
- ECR Fundamentals
- Image Scanning & Vulnerability Management
- Lifecycle Policies & Image Management
- ECR Replication
- Runtime Security
- Secrets Management for Containers
- Cost Optimization Strategies
- Integration Patterns
- When to Use ECR vs Alternatives
- Common Pitfalls
- Key Takeaways
What Problems ECR & Container Security Solve
Container registries and security address critical production challenges:
Image Distribution Challenges:
- Slow deployments: Pulling images from external registries (Docker Hub) over the internet adds latency, especially for multi-GB images
- Rate limiting: Docker Hub free tier limits to 100 pulls per 6 hours from anonymous IPs, 200 pulls for authenticated users - production deployments can exhaust this quickly
- Supply chain risk: External registries introduce dependency on third-party availability; Docker Hub outages have blocked production deployments
- Network costs: Pulling images from internet sources incurs data transfer charges ($0.09/GB from AWS to internet)
Security & Compliance Challenges:
- Vulnerability management: Without scanning, vulnerabilities in base images and dependencies go undetected until runtime exploitation
- Image provenance: Cannot verify image authenticity or track who built what image when
- Secrets exposure: Hardcoded credentials in images or passed via environment variables create security risks
- Compliance requirements: Regulations (PCI-DSS, HIPAA, SOC 2) require vulnerability scanning, image signing, and audit trails
Operational Challenges:
- Image sprawl: Old, unused images accumulate, consuming storage and complicating security scanning
- Multi-region deployments: Deploying globally requires image availability in multiple regions; pulling cross-region adds latency and cost
- Access control: Fine-grained control over who can push/pull which images across teams and environments
ECR and container security solve these problems:
- AWS ECR (Elastic Container Registry): Fully managed Docker registry with native AWS integration
- Image Scanning: Automated vulnerability detection in container images
- Lifecycle Policies: Automated cleanup of old images based on age or count
- Replication: Cross-region and cross-account image distribution
- Runtime Security: Protection against container escape, lateral movement, and runtime threats
- Secrets Management: Secure injection of credentials into containers without hardcoding
ECR Fundamentals
ECR Architecture
Registry hierarchy:
- Registry: AWS account-specific registry (e.g.,
123456789012.dkr.ecr.us-east-1.amazonaws.com) - Repository: Named collection of related images (e.g.,
my-app/backend) - Image: Specific version identified by tag or digest (e.g.,
my-app/backend:v1.2.3orsha256:abc123...)
Key characteristics:
- Regional service: Each region has separate ECR endpoint (replicate for multi-region)
- Private by default: Requires IAM authentication to push/pull (no public access like Docker Hub)
- Public option: ECR Public for open-source projects (public.ecr.aws/… namespace)
- OCI-compliant: Supports Docker images and OCI (Open Container Initiative) artifacts
Authentication & Access Control
Docker authentication:
# Get login token (valid 12 hours)
aws ecr get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin \
123456789012.dkr.ecr.us-east-1.amazonaws.com
# Push image
docker tag my-app:latest 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest
docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest
IAM policies for ECR:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
"Resource": "arn:aws:ecr:us-east-1:123456789012:repository/my-app"
}
]
}
Push permissions (for CI/CD):
{
"Effect": "Allow",
"Action": [
"ecr:PutImage",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload"
],
"Resource": "arn:aws:ecr:us-east-1:123456789012:repository/my-app"
}
Repository policies (cross-account access):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowPull",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::999999999999:root"
},
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:BatchGetImage",
"ecr:GetDownloadUrlForLayer"
]
}
]
}
Tagging Strategies
Effective tagging practices:
- Semantic versioning (production releases):
my-app:1.2.3(immutable release)my-app:1.2(minor version alias, updated on patch releases)my-app:1(major version alias)my-app:latest(points to latest stable release)
- Git-based tags (CI/CD):
my-app:main-abc123f(branch + short commit SHA)my-app:pr-456(pull request number)my-app:build-789(build number)
- Environment tags (deployment tracking):
my-app:staging-2024-11-15(environment + date)my-app:prod-current(currently deployed to production)
Tag immutability:
# Enable immutable tags (prevents overwriting tags)
aws ecr put-image-tag-mutability \
--repository-name my-app \
--image-tag-mutability IMMUTABLE
Why immutability matters: Prevents accidental overwrites of latest tag mid-deployment, ensures reproducible builds, and provides clear audit trail.
Image Scanning & Vulnerability Management
Scanning Types
ECR offers two scanning options:
- Basic scanning (Amazon ECR with Clair):
- Uses open-source Clair scanner
- Scans on push or on-demand
- Limited to OS package vulnerabilities (not application dependencies)
- Free (included in ECR pricing)
- Enhanced scanning (Amazon Inspector integration):
- Continuous scanning (rescans as new CVEs discovered)
- OS packages AND application dependencies (Java, Python, Node.js, .NET, Go, Ruby)
- CVSS scores, exploitability data, remediation guidance
- $0.09 per image scan (first scan per image per repo per month)
Enabling Enhanced Scanning
# Enable enhanced scanning for repository
aws ecr put-image-scanning-configuration \
--repository-name my-app \
--image-scanning-configuration scanOnPush=true
# Enable Inspector enhanced scanning (registry-wide)
aws ecr put-registry-scanning-configuration \
--scan-type ENHANCED \
--rules '[
{
"scanFrequency": "CONTINUOUS_SCAN",
"repositoryFilters": [{"filter": "*", "filterType": "WILDCARD"}]
}
]'
Interpreting Scan Results
Findings structure:
{
"findings": [
{
"name": "CVE-2024-1234",
"severity": "HIGH",
"packageName": "openssl",
"packageVersion": "1.1.1k",
"fixedInVersion": "1.1.1m",
"cvss": {
"score": 7.5,
"vector": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H"
},
"exploitAvailable": false,
"description": "OpenSSL denial of service vulnerability..."
}
]
}
Severity interpretation:
- CRITICAL (CVSS 9.0-10.0): Remote code execution, privilege escalation - block deployment
- HIGH (CVSS 7.0-8.9): High-impact vulnerabilities - require remediation plan
- MEDIUM (CVSS 4.0-6.9): Moderate risk - address in next sprint
- LOW (CVSS 0.1-3.9): Minimal risk - address during maintenance windows
- INFORMATIONAL: Not a vulnerability, but security best practice recommendation
Vulnerability Management Workflow
1. Scan on push (CI/CD integration):
# GitHub Actions example
- name: Build and scan image
run: |
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
- name: Wait for scan results
run: |
aws ecr wait image-scan-complete \
--repository-name $ECR_REPOSITORY \
--image-id imageTag=$IMAGE_TAG
- name: Get scan findings
run: |
aws ecr describe-image-scan-findings \
--repository-name $ECR_REPOSITORY \
--image-id imageTag=$IMAGE_TAG \
--query 'imageScanFindings.findingSeverityCounts'
2. Policy enforcement (fail builds on critical findings):
#!/bin/bash
# fail-on-critical.sh
CRITICAL_COUNT=$(aws ecr describe-image-scan-findings \
--repository-name my-app \
--image-id imageTag=latest \
--query 'imageScanFindings.findingSeverityCounts.CRITICAL' \
--output text)
if [ "$CRITICAL_COUNT" != "None" ] && [ "$CRITICAL_COUNT" -gt 0 ]; then
echo "❌ Found $CRITICAL_COUNT critical vulnerabilities. Blocking deployment."
exit 1
fi
echo "✅ No critical vulnerabilities found. Proceeding."
3. Remediation workflow:
# Before (vulnerable base image)
FROM node:16.14.0
# After (patched base image)
FROM node:16.19.1 # Updated to patch CVE-2024-1234
4. Exception handling (for false positives):
{
"suppressions": [
{
"cve": "CVE-2024-5678",
"reason": "False positive - package not used in production code path",
"approvedBy": "security-team@company.com",
"expiresAt": "2025-01-01T00:00:00Z"
}
]
}
Continuous Scanning
Why continuous scanning matters:
- New CVEs discovered daily: Zero-day vulnerabilities announced after image built
- Aging images: Images deployed months ago may accumulate vulnerabilities
- Compliance: Regulations require ongoing vulnerability management, not just point-in-time
Continuous scan configuration:
# Configure continuous scanning for critical repositories
aws ecr put-registry-scanning-configuration \
--scan-type ENHANCED \
--rules '[
{
"scanFrequency": "CONTINUOUS_SCAN",
"repositoryFilters": [
{"filter": "prod-*", "filterType": "WILDCARD"}
]
},
{
"scanFrequency": "SCAN_ON_PUSH",
"repositoryFilters": [
{"filter": "*", "filterType": "WILDCARD"}
]
}
]'
Cost consideration: Continuous scanning rescans images when new CVE data available. With enhanced scanning at $0.09 per scan, a repository with 100 active images might cost $9/month if each rescanned once. Budget accordingly.
Lifecycle Policies & Image Management
Why Lifecycle Policies Matter
Image sprawl consequences:
- Storage costs: ECR charges $0.10/GB/month. A 2GB image with 50 old tags = $10/month wasted
- Scan costs: Enhanced scanning scans all images (including unused ones) = $0.09 × 50 = $4.50/month
- Security noise: Vulnerability findings in unused images distract from real risks
- Operational overhead: Developers scroll through hundreds of old tags to find the right one
Lifecycle policies automate cleanup:
Lifecycle Policy Rules
Rule types:
- Count-based (keep last N images):
- Keep last 10 images tagged with “prod-*”
- Keep last 100 images overall
- Age-based (expire after X days):
- Delete images older than 30 days
- Delete untagged images older than 1 day
Rule priority: Rules evaluated in priority order (1 = highest). First matching rule applies.
Example Lifecycle Policies
Policy 1: Keep production images, clean up dev:
{
"rules": [
{
"rulePriority": 1,
"description": "Keep last 10 production images",
"selection": {
"tagStatus": "tagged",
"tagPrefixList": ["prod-"],
"countType": "imageCountMoreThan",
"countNumber": 10
},
"action": {
"type": "expire"
}
},
{
"rulePriority": 2,
"description": "Keep last 5 staging images",
"selection": {
"tagStatus": "tagged",
"tagPrefixList": ["staging-"],
"countType": "imageCountMoreThan",
"countNumber": 5
},
"action": {
"type": "expire"
}
},
{
"rulePriority": 3,
"description": "Expire dev images older than 14 days",
"selection": {
"tagStatus": "tagged",
"tagPrefixList": ["dev-", "feature-"],
"countType": "sinceImagePushed",
"countUnit": "days",
"countNumber": 14
},
"action": {
"type": "expire"
}
},
{
"rulePriority": 4,
"description": "Delete untagged images after 1 day",
"selection": {
"tagStatus": "untagged",
"countType": "sinceImagePushed",
"countUnit": "days",
"countNumber": 1
},
"action": {
"type": "expire"
}
}
]
}
Applying the policy:
aws ecr put-lifecycle-policy \
--repository-name my-app \
--lifecycle-policy-text file://lifecycle-policy.json
Testing before applying (dry run):
# Preview what would be deleted
aws ecr preview-lifecycle-policy \
--repository-name my-app \
--lifecycle-policy-text file://lifecycle-policy.json
Policy 2: Semantic versioning retention:
{
"rules": [
{
"rulePriority": 1,
"description": "Keep all semantic version tags (1.2.3)",
"selection": {
"tagStatus": "tagged",
"tagPrefixList": ["v", "[0-9]"],
"countType": "imageCountMoreThan",
"countNumber": 999999
},
"action": {
"type": "expire"
}
},
{
"rulePriority": 2,
"description": "Keep last 20 branch builds",
"selection": {
"tagStatus": "tagged",
"tagPrefixList": ["main-", "develop-"],
"countType": "imageCountMoreThan",
"countNumber": 20
},
"action": {
"type": "expire"
}
}
]
}
Lifecycle Policy Best Practices
Tag strategy alignment:
- Use consistent tag prefixes (
prod-,staging-,dev-) - Separate release tags (semantic versions) from build tags (commit SHAs)
- Never delete semantic version releases (keep indefinitely for rollback)
Testing strategy:
- Always run
preview-lifecycle-policybefore applying - Start conservative (keep more images), tighten gradually
- Monitor deleted images for 30 days to catch unintended cleanup
Cost-benefit analysis:
- Calculate storage saved: (deleted images × avg size × $0.10/GB/month)
- Calculate scan costs saved: (deleted images × rescan frequency × $0.09)
- Example: Deleting 50 × 2GB images saves $10/month storage + potential scan costs
ECR Replication
Why Replication Matters
Multi-region deployments:
- Latency: ECS in
us-west-2pulling from ECR inus-east-1adds ~60ms per layer pull - Cross-region data transfer: $0.02/GB from
us-east-1tous-west-2(for 5GB image = $0.10 per pull) - Availability: Regional ECR outage blocks deployments if images not replicated
- Compliance: Data residency requirements (GDPR, data sovereignty) may require in-region images
Cross-account deployments:
- Centralized CI/CD: Build in central account, deploy to workload accounts
- Multi-tenant architectures: Different customer accounts need same images
- Security isolation: Prevent production accounts from pushing images (only pull)
Replication Configuration
Cross-region replication:
{
"rules": [
{
"destinations": [
{
"region": "us-west-2",
"registryId": "123456789012"
},
{
"region": "eu-west-1",
"registryId": "123456789012"
}
],
"repositoryFilters": [
{
"filter": "prod-*",
"filterType": "PREFIX_MATCH"
}
]
}
]
}
aws ecr put-replication-configuration \
--replication-configuration file://replication-config.json
Cross-account replication:
{
"rules": [
{
"destinations": [
{
"region": "us-east-1",
"registryId": "999999999999"
}
],
"repositoryFilters": [
{
"filter": "*",
"filterType": "PREFIX_MATCH"
}
]
}
]
}
Destination account policy (allow replication):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowReplication",
"Effect": "Allow",
"Principal": {
"Service": "ecr.amazonaws.com"
},
"Action": [
"ecr:CreateRepository",
"ecr:ReplicateImage"
],
"Condition": {
"StringEquals": {
"aws:SourceAccount": "123456789012"
}
}
}
]
}
Replication Behavior
Key characteristics:
- Automatic: Replication triggers on image push to source repository
- Asynchronous: Images replicate within minutes (not instant)
- Filter-based: Only images matching filters replicate (reduce costs)
- Repository creation: ECR automatically creates destination repositories if they don’t exist
- Tag preservation: Tags and manifests replicated exactly
Replication costs:
- Data transfer: $0.02/GB cross-region within US (varies by region pair)
- Storage: Pay for storage in each region ($0.10/GB/month per region)
- No replication fee: ECR doesn’t charge for replication itself (only data transfer + storage)
Example cost:
- 10 images × 2GB each = 20GB
- Replicate from
us-east-1tous-west-2: 20GB × $0.02 = $0.40 one-time - Store in both regions: 20GB × $0.10 × 2 regions = $4/month
Runtime Security
Container Runtime Threats
Common runtime attack vectors:
- Container escape:
- Exploiting kernel vulnerabilities to break out of container namespace
- Accessing host filesystem via misconfigured volume mounts
- Example:
docker run -v /:/hostmounts entire host filesystem
- Privilege escalation:
- Running containers as root unnecessarily
- Granting excessive capabilities (e.g.,
CAP_SYS_ADMIN) - Example: Container with
privileged: truehas full host access
- Lateral movement:
- Compromised container scanning internal network
- Exploiting vulnerable services on other containers/hosts
- Example: Container with overly permissive security group accessing internal databases
- Data exfiltration:
- Malicious code sending sensitive data to external endpoints
- Example: Compromised web app container uploading customer data to attacker-controlled S3 bucket
Runtime Security Best Practices
1. Use minimal base images:
# ❌ Bad: Full OS image (1.2GB, hundreds of packages)
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y python3 python3-pip
# ✅ Good: Distroless image (50MB, minimal attack surface)
FROM python:3.11-slim
# Even better: gcr.io/distroless/python3-debian12
Why distroless:
- No shell (prevents reverse shell attacks)
- No package manager (can’t install malicious tools)
- Minimal libraries (reduces vulnerability count from 500+ to <50)
2. Run as non-root user:
# Create non-root user
RUN useradd -m -u 1000 appuser
USER appuser
# Copy files owned by appuser
COPY --chown=appuser:appuser . /app
WORKDIR /app
ECS task definition:
{
"containerDefinitions": [
{
"name": "my-app",
"user": "1000",
"readonlyRootFilesystem": true
}
]
}
3. Drop unnecessary capabilities:
{
"containerDefinitions": [
{
"name": "my-app",
"linuxParameters": {
"capabilities": {
"drop": ["ALL"],
"add": ["NET_BIND_SERVICE"]
}
}
}
]
}
4. Enable read-only root filesystem:
{
"containerDefinitions": [
{
"name": "my-app",
"readonlyRootFilesystem": true,
"mountPoints": [
{
"sourceVolume": "tmp",
"containerPath": "/tmp"
}
]
}
],
"volumes": [
{
"name": "tmp",
"host": {}
}
]
}
5. Network segmentation:
{
"taskDefinition": {
"networkMode": "awsvpc"
},
"networkConfiguration": {
"awsvpcConfiguration": {
"securityGroups": ["sg-app-only"],
"subnets": ["subnet-private-1a"]
}
}
}
Security group principle of least privilege:
- Inbound: Only ALB security group on port 8080
- Outbound: Only database security group on port 5432, HTTPS to internet for external APIs
Runtime Detection & Response
AWS GuardDuty for ECS/EKS:
GuardDuty Runtime Monitoring detects:
- Suspicious process execution (e.g., shell spawned in container)
- Unexpected network connections (e.g., connection to known C2 server)
- File system access anomalies (e.g., reading /etc/shadow)
- Privilege escalation attempts
Enabling GuardDuty Runtime Monitoring:
aws guardduty create-detector --enable --finding-publishing-frequency FIFTEEN_MINUTES
aws guardduty update-detector \
--detector-id <detector-id> \
--features '[
{
"Name": "ECS_RUNTIME_MONITORING",
"Status": "ENABLED",
"AdditionalConfiguration": [
{
"Name": "ECS_FARGATE_AGENT_MANAGEMENT",
"Status": "ENABLED"
}
]
}
]'
Fargate runtime monitoring (automatic agent injection):
- GuardDuty automatically injects security agent sidecar into Fargate tasks
- No code changes required (enable via detector configuration)
- Agent monitors process, network, and file system activity
EC2-based ECS/EKS (manual agent installation):
# Install GuardDuty agent via SSM
aws ssm send-command \
--document-name "AWS-ConfigureAWSPackage" \
--targets "Key=tag:Environment,Values=production" \
--parameters '{"action":["Install"],"name":["AmazonGuardDutyAgent"]}'
Secrets Management for Containers
Why Secrets Management Matters
Common anti-patterns:
- Hardcoded in Dockerfile:
# ❌ NEVER DO THIS ENV DATABASE_PASSWORD=supersecret123- Credentials in image layers (visible to anyone with pull access)
- Exposed in
docker historyand ECR - Credentials can’t rotate without rebuilding image
- Passed via environment variables:
# ❌ Risky: Visible in process list, logs, crash dumps docker run -e DATABASE_PASSWORD=supersecret123 my-app- Visible in
docker inspect - Logged by orchestrators (ECS task events, Kubernetes events)
- Leaked in application logs if printed
- Visible in
- Stored in config files:
# ❌ Bad: Config file baked into image database: password: supersecret123- Same issues as hardcoding in Dockerfile
AWS Secrets Manager Integration
Store secrets in Secrets Manager:
aws secretsmanager create-secret \
--name prod/myapp/database \
--secret-string '{
"username": "admin",
"password": "randomly-generated-password-here",
"host": "mydb.cluster-xyz.us-east-1.rds.amazonaws.com",
"port": 5432,
"dbname": "production"
}'
ECS task definition (inject at runtime):
{
"family": "my-app",
"executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
"containerDefinitions": [
{
"name": "app",
"image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest",
"secrets": [
{
"name": "DATABASE_URL",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:prod/myapp/database:username::"
},
{
"name": "DATABASE_PASSWORD",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:prod/myapp/database:password::"
}
]
}
]
}
IAM permissions (task execution role):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue"
],
"Resource": "arn:aws:secretsmanager:us-east-1:123456789012:secret:prod/myapp/*"
},
{
"Effect": "Allow",
"Action": [
"kms:Decrypt"
],
"Resource": "arn:aws:kms:us-east-1:123456789012:key/abc-123-def-456"
}
]
}
How it works:
- ECS fetches secret from Secrets Manager at task startup
- Secrets injected as environment variables into container
- Application reads from environment variables (no code changes)
- Secrets never stored in image or visible in ECR
SSM Parameter Store Integration
For non-sensitive config (free tier, simpler):
aws ssm put-parameter \
--name /prod/myapp/api-endpoint \
--value "https://api.example.com" \
--type String
aws ssm put-parameter \
--name /prod/myapp/database-password \
--value "supersecret123" \
--type SecureString \
--key-id alias/aws/ssm
ECS task definition:
{
"secrets": [
{
"name": "API_ENDPOINT",
"valueFrom": "arn:aws:ssm:us-east-1:123456789012:parameter/prod/myapp/api-endpoint"
},
{
"name": "DATABASE_PASSWORD",
"valueFrom": "arn:aws:ssm:us-east-1:123456789012:parameter/prod/myapp/database-password"
}
]
}
Secrets Manager vs Parameter Store:
| Feature | Secrets Manager | Parameter Store (Standard) | Parameter Store (Advanced) |
|---|---|---|---|
| Cost | $0.40/secret/month + $0.05/10K API calls | Free | $0.05/parameter/month |
| Rotation | Built-in Lambda rotation | Manual | Manual |
| Value size | 64KB | 4KB | 8KB |
| Versioning | Automatic | Manual | Automatic |
| Cross-region replication | Built-in | Manual | Manual |
| Use case | Database credentials, API keys | Application config | Large config, frequent changes |
Decision framework:
- Use Secrets Manager for: Database passwords, API keys, OAuth tokens (anything requiring rotation)
- Use Parameter Store for: Static config (endpoints, feature flags, non-sensitive settings)
Kubernetes Secrets (EKS)
External Secrets Operator (sync from Secrets Manager):
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: aws-secretsmanager
spec:
provider:
aws:
service: SecretsManager
region: us-east-1
auth:
jwt:
serviceAccountRef:
name: external-secrets-sa
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: database-credentials
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secretsmanager
target:
name: database-secret
data:
- secretKey: password
remoteRef:
key: prod/myapp/database
property: password
Pod consumes secret:
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
containers:
- name: app
image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest
env:
- name: DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: database-secret
key: password
Cost Optimization Strategies
ECR Pricing Model
Charges:
- Storage: $0.10/GB/month (amount of data stored)
- Data transfer: Standard AWS data transfer pricing
- Free: Within same region
- $0.02/GB: Cross-region (e.g., us-east-1 to us-west-2)
- $0.09/GB: To internet
- Enhanced scanning: $0.09 per image scan (first scan per image per repo per month)
Cost Optimization Techniques
1. Lifecycle policies (reduce storage):
# Before: 100 images × 2GB = 200GB × $0.10 = $20/month
# After (lifecycle policy keeps 20): 20 images × 2GB = 40GB × $0.10 = $4/month
# Savings: $16/month (80%)
2. Multi-stage builds (reduce image size):
# ❌ Before: 1.5GB image (includes build tools)
FROM node:18
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
# ✅ After: 200MB image (only runtime dependencies)
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
FROM node:18-slim
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
CMD ["node", "dist/server.js"]
Savings: 1.5GB → 200MB = 1.3GB saved per image × 20 images = 26GB × $0.10 = $2.60/month
3. Layer caching (reduce data transfer):
# ✅ Good: Copy dependency files first (rarely change)
COPY package.json package-lock.json ./
RUN npm install
# Then copy source code (changes frequently)
COPY . .
RUN npm run build
Why this matters: Docker caches layers. When source code changes but dependencies don’t, Docker reuses cached dependency layer, reducing upload size from 1.5GB to 10MB (only source code).
4. Regional replication strategy:
# ❌ Wasteful: Replicate all images to all regions
# Cost: 100 images × 2GB × 3 regions = 600GB × $0.10 = $60/month
# ✅ Strategic: Replicate only production images to active regions
# Cost: 10 prod images × 2GB × 2 regions = 40GB × $0.10 = $4/month
5. Basic vs enhanced scanning:
# Enhanced scanning cost: 100 images × $0.09 = $9/month (if rescanned)
# Basic scanning cost: $0 (included)
# Strategy: Use enhanced scanning for production repos, basic for dev/test
# Savings: ~$5-7/month for typical workload
Cost Monitoring
CloudWatch metric filters:
aws cloudwatch put-metric-alarm \
--alarm-name ecr-storage-high \
--alarm-description "Alert when ECR storage exceeds 500GB" \
--metric-name RepositoryStorageSize \
--namespace AWS/ECR \
--statistic Average \
--period 86400 \
--threshold 500000000000 \
--comparison-operator GreaterThanThreshold
Cost allocation tags:
aws ecr tag-resource \
--resource-arn arn:aws:ecr:us-east-1:123456789012:repository/my-app \
--tags Key=CostCenter,Value=Engineering Key=Environment,Value=Production
Integration Patterns
CI/CD Pipeline Integration
GitHub Actions workflow:
name: Build and Push to ECR
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: arn:aws:iam::123456789012:role/GithubActionsRole
aws-region: us-east-1
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1
- name: Build, tag, and push image
env:
ECR_REGISTRY: $
ECR_REPOSITORY: my-app
IMAGE_TAG: $
run: |
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
docker tag $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG $ECR_REGISTRY/$ECR_REPOSITORY:latest
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest
- name: Wait for scan
run: |
aws ecr wait image-scan-complete \
--repository-name my-app \
--image-id imageTag=$
- name: Check scan results
run: |
CRITICAL=$(aws ecr describe-image-scan-findings \
--repository-name my-app \
--image-id imageTag=$ \
--query 'imageScanFindings.findingSeverityCounts.CRITICAL' \
--output text)
if [ "$CRITICAL" != "None" ] && [ "$CRITICAL" -gt 0 ]; then
echo "❌ Critical vulnerabilities found"
exit 1
fi
ECS Deployment Integration
ECS task definition (reference ECR image):
{
"family": "my-app",
"taskRoleArn": "arn:aws:iam::123456789012:role/ecsTaskRole",
"executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
"containerDefinitions": [
{
"name": "app",
"image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:prod-v1.2.3",
"cpu": 256,
"memory": 512,
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/my-app",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}
]
}
Blue/green deployment (CodeDeploy):
# appspec.yaml
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: "arn:aws:ecs:us-east-1:123456789012:task-definition/my-app:2"
LoadBalancerInfo:
ContainerName: "app"
ContainerPort: 8080
PlatformVersion: "LATEST"
Hooks:
- BeforeInstall: "LambdaFunctionToValidateBeforeInstall"
- AfterInstall: "LambdaFunctionToValidateAfterInstall"
- BeforeAllowTraffic: "LambdaFunctionToValidateBeforeTrafficShift"
- AfterAllowTraffic: "LambdaFunctionToValidateAfterTrafficShift"
EKS Deployment Integration
Kubernetes deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 3
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: app
image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:v1.2.3
imagePullPolicy: Always
ports:
- containerPort: 8080
imagePullSecrets:
- name: ecr-registry-secret
ECR pull secret (for private images):
kubectl create secret docker-registry ecr-registry-secret \
--docker-server=123456789012.dkr.ecr.us-east-1.amazonaws.com \
--docker-username=AWS \
--docker-password=$(aws ecr get-login-password --region us-east-1)
IRSA (IAM Roles for Service Accounts) - Better approach:
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-app-sa
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/MyAppECRAccessRole
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
serviceAccountName: my-app-sa
containers:
- name: app
image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:v1.2.3
No explicit pull secret needed - EKS automatically uses IRSA role to authenticate.
When to Use ECR vs Alternatives
ECR vs Docker Hub
| Consideration | AWS ECR | Docker Hub |
|---|---|---|
| Pull rate limits | Unlimited | 100/6h (anonymous), 200/6h (free), 5000/day (Pro) |
| Private images | Unlimited private repositories | 1 free private repo (free), unlimited (Pro $5/month) |
| Cost | $0.10/GB storage + data transfer | Free (public), $5-9/month (private) |
| AWS integration | Native (IAM, VPC endpoints, GuardDuty) | Requires credentials management |
| Performance | Low latency from AWS regions | Variable (internet-dependent) |
| Scanning | Built-in (basic free, enhanced $0.09/scan) | Paid feature (Pro+) |
| Use case | AWS-hosted workloads | Public open-source projects, non-AWS environments |
Decision: Use ECR for production AWS workloads; Docker Hub for public images or multi-cloud.
ECR vs Self-Hosted Registry (Harbor, Artifactory)
| Consideration | AWS ECR | Self-Hosted (Harbor) |
|---|---|---|
| Management overhead | Fully managed (zero ops) | Requires HA setup, upgrades, backups |
| Cost | $0.10/GB + scanning | EC2/RDS costs + licensing (if commercial) |
| Customization | Limited (AWS features only) | Full control (plugins, custom auth) |
| Multi-cloud | AWS-only | Works anywhere |
| Air-gapped environments | Not suitable | Works offline |
| Use case | AWS-native architectures | Multi-cloud, on-prem, air-gapped, custom workflows |
Decision: Use ECR unless you need multi-cloud, air-gapped, or highly customized workflows.
ECR Public vs Docker Hub
| Consideration | ECR Public | Docker Hub |
|---|---|---|
| Pull rate limits | Unlimited | 100/6h (anonymous), 5000/day (Pro) |
| Bandwidth | Free (AWS CloudFront) | Free (for reasonable use) |
| Discoverability | Lower (newer platform) | Higher (established community) |
| AWS integration | Native | Requires credentials |
| Use case | Open-source projects targeting AWS users | General open-source projects |
Common Pitfalls
1. Not Enabling Tag Immutability for Production
Problem: Developer accidentally overwrites prod-v1.2.3 tag with different image, breaking deployments.
Solution:
aws ecr put-image-tag-mutability \
--repository-name my-app \
--image-tag-mutability IMMUTABLE
Prevents tag overwrites. Use immutable tags for production, mutable for dev/staging if needed.
2. Missing Lifecycle Policies
Problem: Hundreds of old images accumulate, costing $50+/month in storage and scanning.
Solution: Implement lifecycle policies from day one:
{
"rules": [
{
"rulePriority": 1,
"description": "Keep last 10 production, 5 staging, 3 dev",
"selection": {...},
"action": {"type": "expire"}
}
]
}
3. Hardcoding Secrets in Dockerfiles
Problem:
ENV DATABASE_PASSWORD=supersecret123
Credentials visible in image history and ECR.
Solution: Use Secrets Manager or Parameter Store:
{
"secrets": [
{
"name": "DATABASE_PASSWORD",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:prod/db"
}
]
}
4. Not Replicating Production Images
Problem: ECR outage in us-east-1 blocks deployments globally.
Solution: Replicate production images to DR region:
{
"rules": [{
"destinations": [{"region": "us-west-2"}],
"repositoryFilters": [{"filter": "prod-*"}]
}]
}
5. Running Containers as Root
Problem: Container escape grants root access to host.
Solution:
RUN useradd -m -u 1000 appuser
USER appuser
And in ECS:
{
"user": "1000",
"readonlyRootFilesystem": true
}
6. Large Base Images
Problem: FROM ubuntu creates 1.2GB image, consuming storage and slowing deployments.
Solution: Use slim/distroless images:
FROM python:3.11-slim # 200MB vs 1.2GB
# Or: FROM gcr.io/distroless/python3-debian12
7. Not Monitoring Scan Results
Problem: Critical vulnerabilities in production images go undetected for months.
Solution: Implement automated checks in CI/CD:
CRITICAL_COUNT=$(aws ecr describe-image-scan-findings ... | jq '.criticalCount')
if [ "$CRITICAL_COUNT" -gt 0 ]; then exit 1; fi
And enable EventBridge alerting:
{
"source": ["aws.ecr"],
"detail-type": ["ECR Image Scan"],
"detail": {
"finding-severity-counts": {
"CRITICAL": [{"exists": true}]
}
}
}
8. Forgetting Cross-Account Permissions
Problem: Replication fails silently because destination account doesn’t allow ecr:CreateRepository.
Solution: Add registry policy in destination account:
{
"Sid": "AllowReplication",
"Effect": "Allow",
"Principal": {"Service": "ecr.amazonaws.com"},
"Action": ["ecr:CreateRepository", "ecr:ReplicateImage"]
}
9. Not Using VPC Endpoints for ECR
Problem: Containers in private subnets pull images via NAT Gateway, incurring data transfer costs ($0.045/GB).
Solution: Create VPC endpoints for ECR:
aws ec2 create-vpc-endpoint \
--vpc-id vpc-12345 \
--service-name com.amazonaws.us-east-1.ecr.dkr \
--route-table-ids rtb-12345
aws ec2 create-vpc-endpoint \
--vpc-id vpc-12345 \
--service-name com.amazonaws.us-east-1.ecr.api \
--route-table-ids rtb-12345
Eliminates NAT Gateway costs for ECR traffic.
Key Takeaways
ECR Fundamentals:
- ECR is a fully managed Docker registry with native AWS integration (IAM, VPC, CloudWatch)
- Private by default (requires IAM authentication); ECR Public available for open-source
- Charges: $0.10/GB storage, $0.02/GB cross-region transfer, $0.09/scan (enhanced)
Image Scanning:
- Basic scanning: Free, OS packages only, Clair-based
- Enhanced scanning: $0.09/scan, continuous rescanning, OS + application dependencies, CVSS scores
- Integrate scanning into CI/CD; fail builds on CRITICAL findings; remediate HIGH+ within SLAs
Lifecycle Policies:
- Automate cleanup with count-based (keep last N) or age-based (delete after X days) rules
- Typical policy: Keep 10 production, 5 staging, 3 dev; delete untagged after 1 day
- Saves 80%+ on storage and scanning costs for typical workloads
Replication:
- Cross-region: Low latency, high availability, compliance (data residency)
- Cross-account: Centralized CI/CD, multi-tenant architectures
- Cost: $0.02/GB data transfer + storage in each region
Runtime Security:
- Use distroless/slim base images to minimize attack surface
- Run as non-root, drop capabilities, read-only root filesystem
- Enable GuardDuty Runtime Monitoring for threat detection
- Segment networks with security groups (principle of least privilege)
Secrets Management:
- Never hardcode secrets in Dockerfiles or pass via environment variables
- Use Secrets Manager ($0.40/secret/month) for credentials requiring rotation
- Use Parameter Store (free) for static configuration
- Inject secrets at runtime via ECS task definition or Kubernetes External Secrets Operator
Cost Optimization:
- Lifecycle policies: Reduce storage 80% (e.g., $20/month → $4/month)
- Multi-stage builds: Reduce image size 85% (1.5GB → 200MB)
- Strategic replication: Replicate only production to active regions
- VPC endpoints: Eliminate NAT Gateway costs for ECR traffic
Integration:
- CI/CD: Authenticate with
aws ecr get-login-password, scan on push, fail on critical vulnerabilities - ECS: Reference images by digest for immutability; use task execution role for ECR pull permissions
- EKS: Use IRSA for authentication (no explicit pull secrets); integrate with External Secrets Operator
When to Use ECR:
- Choose ECR for: AWS-native workloads, unlimited pull rate, native IAM integration, production deployments
- Consider alternatives for: Multi-cloud (Harbor), air-gapped (self-hosted), public open-source projects (Docker Hub)
Found this guide helpful? Share it with your team:
Share on LinkedIn