AWS S3 for System Architects
Table of Contents
- What Problems S3 Solves
- S3 Fundamentals
- Storage Classes
- Cost Optimization Strategies
- Performance Optimization
- Security Best Practices
- Data Management
- High Availability and Reliability
- Integration Patterns
- Observability
- When to Use S3 vs Alternatives
- Common Pitfalls
- Key Takeaways
What Problems S3 Solves
Amazon S3 (Simple Storage Service) provides scalable, durable, and cost-effective object storage in the cloud, addressing fundamental data storage challenges:
Storage challenges solved:
Unlimited scalability without infrastructure management: S3 stores any amount of data without capacity planning, server provisioning, or storage array configuration. Organizations store exabytes of data without managing physical infrastructure. You upload objects; AWS handles durability, availability, and scaling.
11 9s durability (99.999999999%): S3 automatically replicates objects across multiple devices in at least three Availability Zones within a region. Designed to sustain the loss of two facilities simultaneously. For 10 million objects stored, you can expect to lose one object every 10,000 years on average.
Cost inefficiency of uniform storage pricing: Traditional storage charges the same rate regardless of access frequency. S3 offers eight storage classes with prices ranging from $0.023/GB (Standard) to $0.00099/GB (Glacier Deep Archive). Objects accessed once per year cost 96% less than frequently accessed objects. Intelligent-Tiering automates this optimization without manual intervention.
Complex data lifecycle management: Managing data retention, archival, and deletion across thousands of objects requires automation. S3 lifecycle policies automatically transition objects between storage classes or delete them based on age. A single policy can manage billions of objects without custom scripts or manual processes.
Data accessibility and sharing: Sharing large datasets traditionally requires file transfer protocols, VPNs, or physical media. S3 presigned URLs provide time-limited access to specific objects without AWS credentials. CloudFront integration delivers content globally with low latency. Cross-account access enables secure data sharing between AWS accounts.
Lack of data protection and compliance: Organizations need versioning, encryption, access logging, and compliance controls. S3 provides all of these capabilities natively with features like Object Lock for regulatory compliance, versioning for accidental deletion protection, and encryption by default.
S3 serves as the foundation for data lakes, backup/archive, content distribution, static website hosting, and application data storage across AWS.
S3 Fundamentals
Object Storage Model
S3 stores data as objects within buckets. Each object consists of:
- Key: Unique identifier within the bucket (can include “/” to simulate folders)
- Value: The actual data (0 bytes to 5 TB per object)
- Metadata: System and user-defined key-value pairs
- Version ID: Unique identifier for object versions (when versioning enabled)
Bucket naming:
- Globally unique across all AWS accounts
- 3-63 characters, lowercase letters, numbers, hyphens
- Must not be formatted as IP addresses
- Cannot be changed after creation
Object keys and prefixes:
S3 uses a flat structure, not a true file system. The key data/2025/january/file.txt creates the appearance of folders through the S3 console, but S3 stores it as a single key. For performance, distribute keys across multiple prefixes to achieve higher request rates.
Consistency Model
S3 provides strong read-after-write consistency for all operations as of December 2020. After a successful PUT (new object) or DELETE, subsequent reads immediately return the latest version. This applies to both new objects and overwrites of existing objects.
What this means:
- Write an object → immediately read the latest version
- Overwrite an object → immediately read the new data
- Delete an object → immediately receive 404 on GET
- No eventual consistency delays for any operations
Implications for architecture: You can use S3 as a strongly consistent data store without building additional consistency logic. S3 Event Notifications trigger immediately when objects are created, updated, or deleted.
Durability and Availability
| Storage Class | Durability | Availability | Availability SLA |
|---|---|---|---|
| Standard | 99.999999999% (11 9s) | 99.99% | 99.9% |
| Intelligent-Tiering | 99.999999999% | 99.9% | 99% |
| Standard-IA | 99.999999999% | 99.9% | 99% |
| One Zone-IA | 99.999999999% (single AZ) | 99.5% | 99% |
| Glacier Instant Retrieval | 99.999999999% | 99.9% | 99% |
| Glacier Flexible Retrieval | 99.999999999% | 99.99% (after restore) | 99.9% |
| Glacier Deep Archive | 99.999999999% | 99.99% (after restore) | 99.9% |
| Express One Zone | 99.999999999% (single AZ) | 99.95% | 99.9% |
Durability = probability data won’t be lost over time Availability = probability you can access data when you need it
One Zone storage classes store data in a single Availability Zone. If that AZ experiences an outage or is destroyed, data may be lost. Use One Zone classes only for reproducible data or when you have cross-region replication configured.
Current Limitations and Constraints
Object size limits:
- Maximum object size: 5 TB
- Maximum single PUT operation: 5 GB
- Objects larger than 100 MB should use multipart upload
- Objects larger than 5 GB must use multipart upload
Request rate limits (per prefix, per second):
- 3,500 PUT/COPY/POST/DELETE requests
- 5,500 GET/HEAD requests
- These limits apply per prefix; use multiple prefixes to scale beyond these limits
Bucket limits:
- 100 buckets per account by default (can request increase to 1,000)
- Bucket names are globally unique and cannot be changed
- Buckets cannot be nested
Lifecycle policy limits:
- 1,000 lifecycle rules per bucket
- Lifecycle transitions have minimum storage duration requirements (30 days for Standard-IA, 90 days for Glacier, etc.)
Object metadata:
- System-defined metadata cannot be modified (e.g., Content-Type must be set on upload)
- User-defined metadata limited to 2 KB
- Metadata keys must be lowercase with hyphens
Versioning:
- Once enabled, cannot be fully disabled (only suspended)
- Each version counts toward storage costs
- No automatic cleanup of old versions without lifecycle policies
Storage Classes
S3 offers eight storage classes optimized for different access patterns and durability requirements. Choosing the right class can reduce costs by 50-96% depending on access frequency.
S3 Standard
Purpose: Frequently accessed data requiring high throughput and low latency
Pricing (US East):
- Storage: $0.023/GB/month
- PUT/COPY/POST: $0.005 per 1,000 requests
- GET: $0.0004 per 1,000 requests
When to use:
- Active datasets accessed multiple times per week
- Content distribution requiring immediate access
- Big data analytics with frequent queries
- Mobile/gaming applications with real-time access
Characteristics:
- Stores data across ≥3 Availability Zones
- Designed for 99.99% availability
- No retrieval fees
- No minimum storage duration
S3 Intelligent-Tiering
Purpose: Data with unknown, changing, or unpredictable access patterns
How it works: Automatically moves objects between five access tiers based on actual usage:
- Frequent Access (default): Objects accessed recently
- Infrequent Access: Objects not accessed for 30 consecutive days (saves 40% vs Standard)
- Archive Instant Access: Objects not accessed for 90 consecutive days (saves 68% vs Standard)
- Archive Access (optional): Objects not accessed for 90-730 days (saves 71% vs Standard)
- Deep Archive Access (optional): Objects not accessed for 180-730 days (saves 95% vs Standard)
Pricing (US East):
- Storage: Varies by tier (Frequent Access = $0.023/GB, Infrequent = $0.0125/GB, Archive Instant = $0.0040/GB)
- Monitoring: $0.0025 per 1,000 objects (objects <128 KB not monitored)
- No retrieval fees (unlike Standard-IA)
- Same request pricing as S3 Standard
When to use:
- Data lakes with unpredictable query patterns
- Long-lived data with changing access patterns
- When you want automatic cost optimization without manual lifecycle rules
- When retrieval fees would be unpredictable or expensive
Best practice: Use Intelligent-Tiering as the default for new data when access patterns are unknown. The monitoring fee ($0.30/year for 10,000 objects) is typically offset by storage savings after 30-90 days of infrequent access.
Important: Objects smaller than 128 KB are always stored in Frequent Access tier and not monitored. For workloads with many small files, use lifecycle policies to Standard-IA instead.
S3 Standard-IA (Infrequent Access)
Purpose: Data accessed less frequently but requiring rapid access when needed
Pricing (US East):
- Storage: $0.0125/GB/month (46% cheaper than Standard)
- PUT/COPY/POST: $0.01 per 1,000 requests (2x Standard)
- GET: $0.001 per 1,000 requests (2.5x Standard)
- Retrieval fee: $0.01/GB
Minimum requirements:
- 30-day minimum storage duration (early deletion fees apply)
- 128 KB minimum billable object size (smaller objects charged as 128 KB)
When to use:
- Backups and disaster recovery data accessed quarterly
- Older data accessed occasionally but requiring immediate availability
- Data transitioning to archive after 30+ days
Cost comparison (1 TB stored for 1 year, retrieved once):
- Standard: $276 storage + $0.40 retrieval = $276.40
- Standard-IA: $150 storage + $10 retrieval = $160 (42% savings)
Don’t use for:
- Data accessed more than once per month (retrieval fees exceed Standard)
- Objects smaller than 128 KB (minimum charge makes it expensive)
- Short-lived data (<30 days; early deletion fees apply)
S3 One Zone-IA
Purpose: Infrequently accessed, recreatable data where single-AZ storage is acceptable
Pricing (US East):
- Storage: $0.01/GB/month (20% cheaper than Standard-IA, 57% cheaper than Standard)
- Same request and retrieval pricing as Standard-IA
Risk: Data stored in a single Availability Zone. If that AZ is destroyed (fire, flood, etc.), data is lost. AWS does not replicate One Zone-IA data across AZs.
When to use:
- Thumbnails or transcoded media easily regenerated from source
- Replica copies when primary data is in Standard or another region
- Non-critical data with acceptable loss risk
Best practice: Enable Cross-Region Replication to a Standard or Standard-IA class in another region if data loss is unacceptable. The replication cost may still be cheaper than using Standard-IA in the primary region.
S3 Express One Zone
Purpose: High-performance, latency-sensitive applications requiring single-digit millisecond access
Performance:
- 10x faster data access compared to S3 Standard
- Consistent single-digit millisecond latency
- Hundreds of thousands of requests per second per bucket
- 50% lower request costs compared to S3 Standard
Pricing (US East):
- Storage: $0.16/GB/month (7x more expensive than Standard)
- PUT/POST: $0.0025 per 1,000 requests (50% cheaper than Standard)
- GET: $0.0002 per 1,000 requests (50% cheaper than Standard)
Architecture:
- Directory buckets (vs general purpose buckets in other classes)
- Must specify Availability Zone at bucket creation
- Co-locate with compute in the same AZ for optimal performance
When to use:
- Machine learning training data requiring rapid access
- High-frequency analytics workloads
- Real-time data processing with strict latency requirements
- Financial trading applications with sub-10ms requirements
Don’t use for:
- Durability-critical data without replication (single-AZ risk)
- Infrequently accessed data (storage cost 7x higher than Standard)
- Cross-region access patterns (designed for same-AZ access)
Best practice: Use Express One Zone for hot data with extreme performance requirements, but replicate critical data to Standard or another region.
S3 Glacier Storage Classes
Glacier classes provide low-cost archival storage for data accessed rarely. All Glacier classes provide 99.999999999% durability across ≥3 AZs.
Glacier Instant Retrieval
Purpose: Archive data requiring immediate access once per quarter
Pricing (US East):
- Storage: $0.004/GB/month (83% cheaper than Standard)
- Retrieval: $0.03/GB
- Retrieval time: Milliseconds (same as Standard)
Minimum requirements:
- 90-day minimum storage duration
- 128 KB minimum billable object size
When to use:
- Medical imaging accessed occasionally but requiring instant access
- News media archives queried for research
- Regulatory archives accessed during audits (quarterly or less)
Cost comparison (1 TB stored for 1 year, retrieved 4 times):
- Standard: $276 storage + $1.60 retrieval = $277.60
- Glacier Instant: $48 storage + $120 retrieval = $168 (39% savings)
Glacier Flexible Retrieval (formerly Glacier)
Purpose: Archives accessed 1-2 times per year with retrieval times of minutes to hours
Pricing (US East):
- Storage: $0.0036/GB/month (84% cheaper than Standard)
- Retrieval: $0.02/GB (Bulk), $0.01/GB (Standard), $0.03/GB (Expedited)
- Retrieval time: 5-12 hours (Bulk), 3-5 hours (Standard), 1-5 minutes (Expedited)
Minimum requirements:
- 90-day minimum storage duration
- 40 KB overhead per object for metadata (affects cost for small objects)
When to use:
- Annual financial audits and compliance archives
- Backup data accessed only for disaster recovery
- Scientific data archives queried occasionally
Retrieval options:
- Bulk ($0.025/GB): 5-12 hours, lowest cost, ideal for large restores
- Standard ($0.01/GB): 3-5 hours, default option
- Expedited ($0.03/GB): 1-5 minutes, for urgent access
Glacier Deep Archive
Purpose: Long-term retention for data accessed less than once per year
Pricing (US East):
- Storage: $0.00099/GB/month (96% cheaper than Standard)
- Retrieval: $0.02/GB (Bulk), $0.10/GB (Standard)
- Retrieval time: 12 hours (Bulk), 48 hours (Standard)
Minimum requirements:
- 180-day minimum storage duration
- 40 KB overhead per object
When to use:
- 7-10 year regulatory retention (financial, healthcare, legal)
- Data archival for compliance with no expected access
- Tape replacement for long-term backup
Cost comparison (1 TB stored for 7 years, never retrieved):
- Standard: $1,932
- Glacier Flexible: $302
- Glacier Deep Archive: $83 (96% savings vs Standard)
Best practice: Deep Archive is the most cost-effective option for long-term storage, but the 180-day minimum and 12-48 hour retrieval times make it suitable only for data you rarely or never access.
Storage Class Selection Framework
| Access Frequency | Latency Requirement | Durability Requirement | Recommended Class |
|---|---|---|---|
| Daily/Weekly | Milliseconds | Multi-AZ | S3 Standard |
| Unknown/Changing | Milliseconds | Multi-AZ | Intelligent-Tiering |
| Sub-10ms required | Milliseconds | Single-AZ acceptable | Express One Zone |
| Monthly | Milliseconds | Multi-AZ | Standard-IA |
| Monthly | Milliseconds | Single-AZ acceptable | One Zone-IA |
| Quarterly | Milliseconds | Multi-AZ | Glacier Instant Retrieval |
| 1-2x/year | Minutes to hours acceptable | Multi-AZ | Glacier Flexible Retrieval |
| <1x/year or never | Hours acceptable | Multi-AZ | Glacier Deep Archive |
Decision tree:
- Do you need sub-10ms latency? → Express One Zone
- Is access pattern unknown? → Intelligent-Tiering
- How often is data accessed?
- Daily/weekly → Standard
- Monthly → Standard-IA (or One Zone-IA if recreatable)
- Quarterly → Glacier Instant Retrieval
- 1-2x/year → Glacier Flexible Retrieval
- <1x/year → Glacier Deep Archive
Cost Optimization Strategies
Properly configuring S3 storage can reduce costs by 50-96% depending on access patterns. Lifecycle policies represent the highest-impact optimization method.
Lifecycle Policies
Lifecycle policies automatically transition objects between storage classes or delete them based on age. A single policy can manage billions of objects without custom scripts.
Common lifecycle transitions:
Standard → Intelligent-Tiering (0 days) → Archive after 90 days
Standard → Standard-IA (30 days) → Glacier Flexible (90 days) → Glacier Deep Archive (365 days)
Standard → Glacier Instant (90 days) → Glacier Flexible (365 days)
Transition rules:
- Objects must remain in Standard for ≥30 days before transitioning to Standard-IA or One Zone-IA
- Objects must remain in Standard or Standard-IA for ≥90 days before transitioning to Glacier Instant
- No minimum for transitions to Intelligent-Tiering
- Transitions from Standard to Glacier Instant skip Standard-IA
Cost considerations:
- No data retrieval charges for lifecycle transitions
- Per-request ingestion charges apply when objects are transitioned
- Each transition incurs a PUT request cost (Standard to Standard-IA = $0.01 per 1,000 objects)
Small object penalty: For each object in Glacier, S3 adds 40 KB of chargeable overhead for metadata. Transitioning many small objects to Glacier may cost more than keeping them in Standard.
Example (1,000 objects, 10 KB each):
- Standard: 10 MB × $0.023 = $0.00023/month
- Glacier Flexible: (10 MB + 40 MB overhead) × $0.0036 = $0.00018/month
The overhead makes Glacier more expensive. Only transition objects >128 KB to Glacier classes.
Best practice lifecycle policy (general purpose):
Days 0-30: Standard (or Intelligent-Tiering if access pattern unknown)
Days 30-90: Standard-IA
Days 90-365: Glacier Instant Retrieval
Days 365+: Glacier Flexible Retrieval or Deep Archive
Advanced lifecycle actions:
- Delete expired object delete markers: Cleans up delete markers after all versions deleted
- Delete incomplete multipart uploads: Removes abandoned uploads after 7 days (saves storage costs)
- Expire current versions: Automatically delete objects after X days
- Permanently delete noncurrent versions: Delete old versions after X days
Intelligent-Tiering as Default
Use case: When access patterns are unknown or change over time
How it saves money: After 30 days without access, objects automatically move to Infrequent Access tier (40% savings). After 90 days, objects move to Archive Instant Access (68% savings). No retrieval fees.
Cost comparison (1 TB data, unknown access pattern, stored 1 year):
| Scenario | Standard | Intelligent-Tiering | Savings |
|---|---|---|---|
| Accessed weekly | $276 | $276 (stays Frequent) | $0 |
| Accessed once after 60 days | $276 | $221 (moves to Infrequent) | $55 (20%) |
| Accessed once after 120 days | $276 | $138 (moves to Archive Instant) | $138 (50%) |
| Never accessed | $276 | $78 (stays Archive Instant) | $198 (72%) |
Best practice: Set Intelligent-Tiering as the default storage class for new buckets when access patterns are unpredictable. The monitoring fee ($0.0025 per 1,000 objects) is offset by savings after 30-90 days.
Request Cost Optimization
Request pricing (S3 Standard, US East):
- PUT/COPY/POST/LIST: $0.005 per 1,000 requests
- GET/SELECT: $0.0004 per 1,000 requests
Strategies:
-
Batch small files: Uploading 1 million 1 KB files costs $5 in PUT requests. Combining into 1,000 files of 1 MB each costs $0.005 (99.9% savings). Use TAR/ZIP archives for many small files.
-
Use S3 Select: Query data inside objects without retrieving entire files. Retrieving 1 GB to extract 10 MB costs $0.0004 for GET + $0.092 data transfer. S3 Select costs $0.0004 GET + $0.002 for scanning 1 GB + $0.0007 for returning 10 MB = $0.0031 total, but saves data transfer costs.
-
Cache frequently accessed objects: Use CloudFront for frequently accessed content. 1 million requests to S3 Standard costs $400 (GET requests). The same via CloudFront costs $75 (81% savings).
-
Reduce LIST operations: LIST costs $0.005 per 1,000 requests. Listing 1 million objects 100 times/day costs $500/day. Use S3 Inventory ($0.0025 per 1M objects) to get daily object lists instead of repeated API calls.
Data Transfer Cost Optimization
Data transfer pricing (out to internet):
- First 10 TB/month: $0.09/GB
- Next 40 TB/month: $0.085/GB
- Next 100 TB/month: $0.07/GB
- Over 150 TB/month: $0.05/GB
Strategies:
-
Use CloudFront: Data transfer from S3 to CloudFront is free. CloudFront to internet is cheaper than S3 direct ($0.085/GB vs $0.09/GB for first 10 TB). Caching reduces S3 GET requests.
-
Use S3 Transfer Acceleration sparingly: Transfer Acceleration costs $0.04-$0.08/GB on top of standard transfer costs. Only use for long-distance uploads where speed matters. For most uploads, direct S3 upload is cheaper.
-
VPC Endpoints for internal traffic: Data transfer from EC2 to S3 via VPC endpoint (PrivateLink) has no data transfer charges. Without VPC endpoint, traffic goes through internet gateway and incurs NAT gateway costs ($0.045/GB).
-
Requester Pays: For public datasets, enable Requester Pays so data consumers pay transfer costs. The requester must include
x-amz-request-payerheader in requests.
Storage Cost Analysis
Monthly cost comparison (1 TB data, US East):
| Storage Class | Storage Cost | Retrieval Cost (1x) | Total Monthly Cost |
|---|---|---|---|
| Standard | $23.00 | $0.00 | $23.00 |
| Intelligent-Tiering (Frequent) | $23.00 | $0.00 | $23.30 (with monitoring) |
| Intelligent-Tiering (Infrequent) | $12.50 | $0.00 | $12.80 (with monitoring) |
| Intelligent-Tiering (Archive Instant) | $4.00 | $0.00 | $4.30 (with monitoring) |
| Standard-IA | $12.50 | $10.00 | $22.50 |
| One Zone-IA | $10.00 | $10.00 | $20.00 |
| Glacier Instant | $4.00 | $30.00 | $34.00 |
| Glacier Flexible | $3.60 | $20.00 | $23.60 |
| Glacier Deep Archive | $0.99 | $20.00 | $20.99 |
Key insight: For data accessed more than once per month, Intelligent-Tiering or Standard-IA is cheaper than Glacier classes due to retrieval fees. Glacier is only cost-effective for rarely accessed data.
Cost Monitoring Tools
AWS Cost Explorer:
- Filter by S3 service and storage class
- Identify which buckets and storage classes drive costs
- Set up monthly cost anomaly alerts
S3 Storage Lens:
- Free tier: Account-level metrics (object count, storage by class)
- Advanced tier ($0.20 per million objects analyzed): Bucket-level metrics, activity trends, recommendations
S3 Storage Class Analysis:
- Analyzes access patterns for a bucket
- Recommends lifecycle policies based on actual usage
- Free tool, export results to S3
Best practice: Enable S3 Storage Lens (free tier) and review monthly. Create lifecycle policies based on Storage Class Analysis recommendations.
Performance Optimization
S3 can scale to handle thousands of requests per second per prefix. Proper architecture and configuration maximize throughput and minimize latency.
Request Rate Scaling
Per-prefix limits:
- 3,500 PUT/COPY/POST/DELETE requests per second
- 5,500 GET/HEAD requests per second
Key insight: These limits apply per prefix, not per bucket. A bucket with 10 prefixes can handle 35,000 writes/second and 55,000 reads/second.
Prefix strategy:
Bad (single prefix):
bucket/image1.jpg
bucket/image2.jpg
bucket/image3.jpg
→ Limited to 5,500 GET/s
Good (multiple prefixes using hash):
bucket/a1b2/image1.jpg
bucket/c3d4/image2.jpg
bucket/e5f6/image3.jpg
→ Scales to 5,500 GET/s per prefix = 16,500+ GET/s total
Common prefix patterns:
- Hash-based: Use first 4 characters of object hash as prefix (
bucket/a1b2/objectname) - Random: Generate random prefix (
bucket/rand1234/objectname) - Date-based (avoid for high-volume writes):
bucket/2025/01/13/logfile.txtcreates hot prefix on current date
Best practice: For high-request workloads, distribute objects across 10-100 prefixes using hash or random prefix generation.
Multipart Upload
Multipart upload divides large files into smaller parts uploaded in parallel, improving speed and resilience.
When to use:
- Recommended: Objects >100 MB
- Required: Objects >5 GB (single PUT limit)
Performance benefits:
- Upload parts in parallel (faster total upload time)
- Retry only failed parts (not entire file)
- Upload parts out of order
- Upload while object is being created (streaming)
Configuration (AWS CLI):
multipart_threshold: Start multipart upload when file exceeds this size (default 8 MB)multipart_chunksize: Size of each part (default 8 MB, range 5 MB to 5 GB)- Optimal chunk size: 25-100 MB for files >1 GB
Example (1 GB file uploaded as 50 MB parts in 6 parallel uploads):
- Single PUT: ~90 seconds (over 100 Mbps connection)
- Multipart: ~30 seconds (61% faster)
Best practice: Use AWS SDKs or CLI, which handle multipart upload automatically. For files >100 MB, explicitly configure chunk size to 25-100 MB for optimal performance.
Cleanup: Enable lifecycle policy to delete incomplete multipart uploads after 7 days. Abandoned multipart uploads consume storage and incur costs.
{
"Rules": [
{
"Id": "DeleteIncompleteUploads",
"Status": "Enabled",
"AbortIncompleteMultipartUpload": {
"DaysAfterInitiation": 7
}
}
]
}
S3 Transfer Acceleration
Transfer Acceleration routes uploads through CloudFront edge locations and AWS backbone network, reducing latency for long-distance transfers.
How it works: Users upload to nearest CloudFront edge location (over 400 globally) → data transferred to S3 over AWS backbone network → optimized TCP/IP for long-haul transfers.
Performance improvement:
- 50-500% faster for long-distance transfers (>1,000 miles)
- Greatest benefit for uploads from remote regions (Asia to US East, Europe to Asia-Pacific)
- Minimal benefit for same-region uploads
Pricing:
- $0.04/GB for uploads over US, Europe, Japan
- $0.08/GB for all other regions
- No charge if Transfer Acceleration doesn’t improve transfer speed
When to use:
- Users uploading large files from geographically dispersed locations
- Mobile apps with global user base
- Backup/DR data from remote sites
Don’t use for:
- Same-region transfers (negligible benefit, added cost)
- Small files (<1 MB; overhead exceeds benefit)
- Cost-sensitive workloads (adds $0.04-$0.08/GB)
Test performance: AWS provides speed comparison tool at https://s3-accelerate-speedtest.s3-accelerate.amazonaws.com/en/accelerate-speed-comparsion.html
Best practice: Enable Transfer Acceleration only for buckets with global user uploads. Use CloudFront signed URLs for downloads (cheaper and faster than Transfer Acceleration for downloads).
Byte-Range Fetches
Byte-range fetches retrieve specific portions of an object instead of the entire file, improving performance and reducing costs.
How to use:
Include Range: bytes=0-1023 header in GET request to retrieve first 1 KB of object.
Use cases:
- Read file headers: Retrieve metadata from first few KB (EXIF data, file format headers)
- Resume downloads: Download remaining bytes after network failure
- Parallel downloads: Download different byte ranges in parallel threads
- Seek to position: Stream video by retrieving only requested timestamp range
Performance benefit (10 GB video file, user seeks to 50% position):
- Full download: 10 GB × $0.09/GB transfer = $0.90 cost, ~90 seconds over 1 Gbps
- Byte-range (5 GB-10 GB): 5 GB × $0.09/GB transfer = $0.45 cost, ~45 seconds
Best practice: Use byte-range fetches for large media files (video, audio) and random-access patterns. Not beneficial for full-file downloads (same cost, added complexity).
CloudFront Integration
CloudFront provides edge caching for S3 content, reducing latency and S3 request costs.
Performance benefits:
- Lower latency: Edge locations serve content from cache (10-50ms vs 100-200ms from S3 region)
- Reduced S3 load: Cache hit ratio of 80-95% reduces S3 GET requests by 80-95%
- Lower costs: CloudFront GET costs $0.0075 per 10,000 requests vs S3 GET $0.004 per 1,000 requests (but data transfer from S3 to CloudFront is free)
Configuration:
- Create CloudFront distribution with S3 bucket as origin
- Set TTL (Time To Live) for caching: 1 hour to 1 day for static content
- Use Origin Access Control (OAC) to restrict S3 access to CloudFront only
Cost comparison (1 million requests, 1 GB average response size):
| Delivery Method | Request Cost | Transfer Cost | Total Cost |
|---|---|---|---|
| Direct from S3 | $0.40 | $90 (1 PB transfer) | $90.40 |
| CloudFront (80% cache hit) | $7.50 | $85 (850 TB transfer) | $92.50 |
| CloudFront (95% cache hit) | $7.50 | $76.50 (850 TB transfer) | $84.00 |
Key insight: CloudFront saves costs for frequently accessed content with high cache hit ratios. For infrequently accessed content, direct S3 access is cheaper.
Best practice: Use CloudFront for static websites, images, videos, and public datasets accessed by geographically distributed users. Don’t use CloudFront for private data accessed by single-region internal applications.
S3 Select and Glacier Select
S3 Select retrieves subsets of data using SQL queries instead of retrieving entire objects, reducing data transfer and improving performance.
How it works:
Query CSV, JSON, or Parquet files with SQL SELECT statements. S3 processes query server-side and returns only matching rows.
Example:
SELECT name, age FROM s3object WHERE age > 30
Object size: 1 GB (10 million rows). Matching rows: 10 MB (100,000 rows).
Cost comparison:
| Method | Data Scanned | Data Returned | Request Cost | Scan Cost | Return Cost | Transfer Cost | Total Cost |
|---|---|---|---|---|---|---|---|
| Full GET | 1 GB | 1 GB | $0.0004 | $0 | $0 | $90 | $90.00 |
| S3 Select | 1 GB | 10 MB | $0.0004 | $2.00 | $0.0007 | $0.90 | $2.90 |
Pricing:
- Scanned: $0.002 per GB
- Returned: $0.0007 per GB
Performance benefit: Retrieving 10 MB from 1 GB file over 100 Mbps connection:
- Full GET: ~80 seconds to download 1 GB, then filter client-side
- S3 Select: ~5 seconds to download 10 MB (94% faster)
Limitations:
- Supports CSV, JSON, Parquet only (not binary formats)
- Maximum 256 MB output per query
- Limited SQL functionality (no JOINs, no GROUP BY)
Best practice: Use S3 Select for log analysis, CSV processing, and JSON filtering where you need <10% of data. For complex queries or full dataset analysis, use Athena instead.
Security Best Practices
S3 security follows a defense-in-depth approach using multiple layers: IAM policies, bucket policies, Access Control Lists (ACLs), encryption, and network controls.
Block Public Access (Default Since 2023)
All new S3 buckets created after April 2023 have Block Public Access enabled by default. This prevents accidental public exposure of sensitive data.
Four Block Public Access settings:
- BlockPublicAcls: Reject PUT requests with public ACLs
- IgnorePublicAcls: Ignore existing public ACLs
- BlockPublicPolicy: Reject bucket policies granting public access
- RestrictPublicBuckets: Restrict access to bucket policies that grant public access
Best practice: Keep Block Public Access enabled at the account level unless you have specific public bucket requirements (static websites, public datasets). For public buckets, enable Block Public Access account-wide, then selectively disable at the bucket level with documented justification.
Common mistake: Disabling Block Public Access account-wide to fix one public bucket. This exposes all buckets to misconfiguration risk.
Encryption at Rest
All S3 buckets have encryption enabled by default as of January 2023. New objects are automatically encrypted with SSE-S3 unless you specify a different method.
Encryption options:
| Method | Key Management | Performance | Use Case |
|---|---|---|---|
| SSE-S3 | AWS-managed keys | No impact | Default, simplest option |
| SSE-KMS | AWS KMS customer-managed keys | Request throttling possible | Audit key usage, key rotation, access control |
| DSSE-KMS | Dual-layer encryption with KMS | Request throttling possible | Compliance requiring dual encryption |
| SSE-C | Customer-provided keys | No impact | Customer controls keys outside AWS |
SSE-S3 (default):
- AWS manages encryption keys
- AES-256 encryption
- No additional cost
- No key rotation management required
SSE-KMS:
- Use AWS KMS customer-managed keys (CMKs)
- Audit key usage via CloudTrail
- Automatic key rotation (optional)
- Cost: $0.03 per 10,000 requests (KMS API calls)
- Throttling risk: KMS has request quotas (5,500-30,000 requests/second depending on region). High-volume S3 workloads may hit KMS limits.
Best practice: Use SSE-S3 for most workloads. Use SSE-KMS when you need audit trails of key usage or fine-grained access control over encryption keys. For high-volume workloads (>5,000 requests/second), request KMS quota increase or use SSE-S3.
Enforce encryption: Use bucket policy to reject unencrypted uploads:
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::bucket-name/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "AES256"
}
}
}
Encryption in Transit
Always use HTTPS (TLS) for data in transit. Reject unencrypted HTTP connections with bucket policy:
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::bucket-name",
"arn:aws:s3:::bucket-name/*"
],
"Condition": {
"Bool": {
"aws:SecureTransport": "false"
}
}
}
Best practice: Apply this policy to all S3 buckets. This prevents accidental use of HTTP connections, which transmit data in plaintext.
IAM Policies vs Bucket Policies
IAM policies: Attached to users, groups, or roles. Control what actions the identity can perform.
Bucket policies: Attached to S3 buckets. Control who can access the bucket and what actions they can perform.
When to use each:
- IAM policy: Control access for users within your AWS account. Example: Grant EC2 instance role permission to read from S3 bucket.
- Bucket policy: Grant access to external AWS accounts, grant public access (static websites), or enforce bucket-level controls (require encryption, HTTPS).
Best practice: Use IAM policies for internal access control. Use bucket policies for cross-account access and bucket-level security enforcement (encryption, HTTPS).
Least privilege example (IAM policy for read-only access to specific prefix):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::bucket-name",
"arn:aws:s3:::bucket-name/app-data/*"
]
}
]
}
S3 Access Points
Access Points simplify access control for shared datasets by creating named endpoints with unique policies.
Problem: A single bucket policy becomes complex when managing access for multiple applications, each needing different permissions.
Solution: Create Access Points with individual policies.
Example (analytics bucket accessed by data science and reporting teams):
- Access Point 1 (
finance-access-point): Grant read access tofinance/*prefix for finance team - Access Point 2 (
analytics-access-point): Grant read/write access toanalytics/*prefix for data science team - Bucket policy: Delegate access control to Access Points
Benefits:
- Simplifies policy management (separate policy per Access Point vs one complex bucket policy)
- Unique endpoint per application (easier to audit and monitor)
- VPC-specific Access Points (restrict access to specific VPC)
Best practice: Use Access Points for shared buckets accessed by multiple applications or teams. Create VPC-restricted Access Points for internal applications.
VPC Endpoints for S3
VPC endpoints allow private connectivity from VPC to S3 without internet gateway, reducing data transfer costs and improving security.
Gateway endpoint (recommended for S3):
- Free (no hourly charge, no data processing charge)
- Modify route table to route S3 traffic through endpoint
- Traffic stays on AWS network
Cost savings:
- Without VPC endpoint: EC2 → NAT Gateway → S3 (NAT Gateway costs $0.045/GB)
- With VPC endpoint: EC2 → VPC Endpoint → S3 ($0.00/GB)
For 1 TB/month transfer from EC2 to S3: NAT Gateway costs $45/month. VPC endpoint costs $0/month. Annual savings: $540.
Security benefit: S3 traffic never traverses internet. Use bucket policy to restrict access to specific VPC endpoint:
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::bucket-name",
"arn:aws:s3:::bucket-name/*"
],
"Condition": {
"StringNotEquals": {
"aws:SourceVpce": "vpce-1234567"
}
}
}
Best practice: Create VPC endpoint for S3 in every VPC that accesses S3. Restrict sensitive buckets to specific VPC endpoints.
S3 Object Lock
Object Lock provides Write-Once-Read-Many (WORM) protection to prevent object deletion or modification for a fixed period or indefinitely.
Use cases:
- Regulatory compliance (SEC 17a-4, FINRA, HIPAA)
- Ransomware protection (prevent deletion during retention period)
- Data integrity (prevent tampering)
Modes:
- Governance mode: Users with special permissions can override retention or delete objects
- Compliance mode: No user can delete object until retention period expires (including root account)
Retention types:
- Retention period: Object locked until specific date
- Legal hold: Object locked indefinitely until legal hold removed
Best practice: Use Compliance mode for regulatory data. Use Governance mode for general data protection with emergency override capability. Enable MFA Delete for additional protection.
Enable Object Lock: Must be enabled at bucket creation. Cannot be enabled on existing buckets (must create new bucket and copy objects).
Logging and Monitoring
S3 Server Access Logging:
- Logs all requests made to a bucket
- Delivered to target S3 bucket (up to hours delay)
- Free (pay only for storage of log files)
- Use for audit trails, security analysis, compliance
CloudTrail S3 Data Events:
- Logs API calls to S3 objects (GetObject, PutObject, DeleteObject)
- Near real-time delivery to CloudWatch Logs or S3
- Cost: $0.10 per 100,000 events
- Use for real-time monitoring, alerting, integration with security tools
CloudWatch Metrics:
- BucketSizeBytes, NumberOfObjects (daily, free)
- Request metrics (PUT/GET/DELETE request counts, latency) (requires opt-in, $0.01 per 1,000 metrics)
Best practice: Enable Server Access Logging on all buckets for audit trails (low cost). Enable CloudTrail Data Events for sensitive buckets requiring real-time monitoring. Set CloudWatch alarms for anomalous request patterns (sudden spike in DELETE requests may indicate ransomware).
Data Management
Versioning
Versioning preserves all versions of an object, protecting against accidental deletion and overwrites.
How it works:
- Each PUT creates a new version with unique Version ID
- DELETE creates delete marker (does not permanently delete)
- Retrieve any previous version by specifying Version ID
Storage implications: Every version consumes storage. 1 GB object modified 10 times = 10 GB storage (10 versions × 1 GB each).
Cost optimization: Use lifecycle policies to delete old versions:
{
"Rules": [
{
"Id": "DeleteOldVersions",
"Status": "Enabled",
"NoncurrentVersionExpiration": {
"NoncurrentDays": 30
}
}
]
}
MFA Delete: Requires MFA authentication to permanently delete version or suspend versioning. Provides additional protection against accidental or malicious deletion.
Best practice: Enable versioning on buckets with critical data. Use lifecycle policies to delete noncurrent versions after 30-90 days. Enable MFA Delete for compliance-critical buckets.
Common mistake: Enabling versioning without lifecycle policy to clean up old versions. Storage costs grow indefinitely as objects are modified.
Replication
S3 Replication automatically copies objects between buckets in the same or different regions.
Replication types:
- Cross-Region Replication (CRR): Replicate to bucket in different AWS region
- Same-Region Replication (SRR): Replicate to bucket in same region
Requirements:
- Versioning must be enabled on source and destination buckets
- Source and destination buckets must be owned by different AWS accounts OR use different storage classes
Use cases:
CRR:
- Disaster recovery (replicate to different region)
- Lower latency access (replicate to region closer to users)
- Compliance (store copies in specific regions)
SRR:
- Data sovereignty (replicate to bucket in same region but different account)
- Log aggregation (replicate logs from multiple buckets to central bucket)
- Development/test replication (replicate production data to dev bucket)
Replication options:
- Replication Time Control (RTC): 99.99% of objects replicated within 15 minutes (SLA-backed)
- Delete marker replication: Replicate delete markers (disabled by default)
- Existing object replication (S3 Batch Replication): Replicate objects that existed before replication was enabled
Pricing:
- CRR: Pay for storage in destination region + data transfer between regions ($0.02/GB)
- SRR: Pay for storage in destination bucket + replication PUT requests ($0.005 per 1,000)
Cost example (1 TB replicated with CRR from us-east-1 to eu-west-1):
- Storage in eu-west-1: 1 TB × $0.023 = $23/month
- Data transfer: 1 TB × $0.02 = $20 one-time
- PUT requests: Negligible
Best practice: Use CRR for disaster recovery. Enable RTC for critical data requiring fast recovery. Use SRR for log aggregation and compliance requirements within the same region.
S3 Event Notifications
S3 can trigger notifications when objects are created, deleted, or restored from Glacier.
Supported destinations:
- SNS Topic: Publish to SNS for fan-out to multiple subscribers
- SQS Queue: Queue events for asynchronous processing
- Lambda Function: Trigger serverless function immediately
- EventBridge: Route events to multiple targets with filtering
Event types:
s3:ObjectCreated:*(Put, Post, Copy, CompleteMultipartUpload)s3:ObjectRemoved:*(Delete, DeleteMarkerCreated)s3:ObjectRestore:*(Post, Completed from Glacier)
Use cases:
- Generate thumbnails when images uploaded (Lambda trigger on ObjectCreated)
- Update search index when documents uploaded (SQS queue + worker processing)
- Send alerts when critical data deleted (SNS notification on ObjectRemoved)
- Trigger ETL pipeline when data files arrive (EventBridge routing to Step Functions)
EventBridge vs direct notifications:
- Direct notifications (SNS/SQS/Lambda): Lower latency (seconds), simpler setup
- EventBridge: Advanced filtering, multiple targets per event, integration with AWS services (Step Functions, ECS tasks)
Best practice: Use Lambda for simple, immediate processing. Use SQS for buffering high-volume events. Use EventBridge for complex workflows requiring filtering and multiple targets.
Filtering: Filter events by prefix or suffix:
{
"LambdaFunctionConfigurations": [
{
"LambdaFunctionArn": "arn:aws:lambda:region:account:function:ProcessImages",
"Events": ["s3:ObjectCreated:*"],
"Filter": {
"Key": {
"FilterRules": [
{"Name": "prefix", "Value": "images/"},
{"Name": "suffix", "Value": ".jpg"}
]
}
}
}
]
}
S3 Inventory
S3 Inventory provides scheduled reports of objects and metadata, eliminating the need for LIST API calls.
What it provides:
- Daily or weekly CSV/Parquet reports of all objects in bucket
- Object metadata (size, storage class, encryption status, replication status, last modified)
Use cases:
- Audit encryption status (identify unencrypted objects)
- Identify objects eligible for lifecycle transitions (analyze age and access patterns)
- Generate reports for compliance (list all objects with retention tags)
- Cost analysis (understand storage distribution by storage class)
Pricing: $0.0025 per 1 million objects analyzed (included in S3 pricing)
Best practice: Enable S3 Inventory on large buckets (>1 million objects) to avoid expensive LIST operations. Use Athena to query inventory reports for insights.
Example (analyze storage class distribution):
SELECT storage_class, COUNT(*), SUM(size)
FROM s3_inventory
GROUP BY storage_class;
High Availability and Reliability
S3 provides built-in high availability and durability without additional configuration, but understanding the model helps you design resilient architectures.
Durability: 99.999999999% (11 9s)
What this means: S3 stores objects redundantly across multiple devices in at least three Availability Zones. Designed to sustain the concurrent loss of data in two facilities.
For 10 million objects stored in S3, you can expect to lose one object every 10,000 years on average.
How S3 achieves durability:
- Data is redundantly stored across ≥3 AZs (except One Zone classes)
- Each object is checksummed on write and periodically verified
- Corrupted data is automatically detected and repaired
- Failed devices are detected and data rebuilt automatically
One Zone classes: One Zone-IA and Express One Zone store data in a single AZ. If that AZ is destroyed, data is lost. Use only for reproducible data or with cross-region replication.
Availability SLA
| Storage Class | Availability Design | Availability SLA |
|---|---|---|
| Standard | 99.99% | 99.9% |
| Intelligent-Tiering | 99.9% | 99% |
| Standard-IA | 99.9% | 99% |
| One Zone-IA | 99.5% (single AZ) | 99% |
| Express One Zone | 99.95% (single AZ) | 99.9% |
Availability = probability you can successfully retrieve data when requested
SLA = AWS credits if availability falls below this threshold
99.9% availability = up to 43 minutes downtime per month 99.99% availability = up to 4.3 minutes downtime per month
Best practice: For critical applications requiring >99.99% availability, implement multi-region replication and DNS failover with Route 53.
Multi-Region Architecture
For disaster recovery or global data distribution, replicate data across regions using Cross-Region Replication.
Architecture pattern (active-active, two regions):
- Primary region (us-east-1): S3 bucket with versioning enabled
- Secondary region (eu-west-1): S3 bucket with versioning enabled
- Bidirectional replication: us-east-1 ↔ eu-west-1
- Route 53 latency-based routing: Route requests to nearest region
- CloudFront (optional): Global edge caching for both regions
Failover scenario: If us-east-1 unavailable, Route 53 health checks detect failure and route all traffic to eu-west-1. When us-east-1 recovers, replication synchronizes data.
Cost (1 TB data, bidirectional CRR):
- Storage: 2 TB × $0.023 = $46/month (2 regions)
- Data transfer (CRR): 1 TB × $0.02 = $20 one-time per TB replicated
- Route 53: $0.50/month per hosted zone
Best practice: Use CRR with RTC (Replication Time Control) for critical data requiring fast failover. Test failover procedures quarterly.
S3 Storage Lens for Reliability
S3 Storage Lens provides organization-wide visibility into object storage usage and activity metrics.
Reliability metrics:
- Replication metrics: Replication pending bytes, replication latency
- Encryption status: Percentage of objects encrypted
- Versioning status: Percentage of buckets with versioning enabled
- Delete markers: Count of delete markers (may indicate accidental deletions)
Best practice: Enable S3 Storage Lens advanced metrics ($0.20 per million objects analyzed) for production accounts. Set up alerts for replication lag, unencrypted objects, or buckets without versioning.
Integration Patterns
Static Website Hosting
S3 can host static websites (HTML, CSS, JavaScript, images) without web servers.
Configuration:
- Enable static website hosting on bucket
- Specify index document (
index.html) and error document (404.html) - Make bucket public (or use CloudFront with Origin Access Control)
- Access via
http://bucket-name.s3-website-region.amazonaws.com
Cost (static website with 100 GB content, 1 million page views/month):
- Storage: 100 GB × $0.023 = $2.30/month
- Data transfer: 100 GB × 1M views × $0.09/GB = $9,000/month
- With CloudFront: $2.30 storage + $850 CloudFront transfer = $852.30/month (90% savings)
Best practice: Always use CloudFront in front of S3 static website hosting. CloudFront provides HTTPS, custom domain, lower latency, and 90%+ cost savings on data transfer.
Custom domain:
- Create Route 53 hosted zone for
example.com - Create S3 bucket named
example.com(must match domain name) - Create CloudFront distribution with S3 as origin
- Create Route 53 A record (alias) pointing to CloudFront distribution
Data Lake Architecture
S3 serves as the storage layer for data lakes, storing raw, processed, and curated data at scale.
Common architecture:
Raw Zone (S3 Standard-IA or Intelligent-Tiering)
→ ETL (AWS Glue)
→ Processed Zone (S3 Standard)
→ Analytics (Athena, Redshift Spectrum, EMR)
→ Curated Zone (S3 Standard-IA)
Storage class strategy:
- Raw zone: Intelligent-Tiering (access pattern unknown for new data)
- Processed zone: Standard (frequently queried by analytics)
- Curated zone: Standard-IA or Glacier (accessed for reports, less frequently)
Partitioning (critical for query performance): Organize data by partition keys to reduce amount scanned by queries:
s3://bucket/logs/year=2025/month=01/day=13/logs.parquet
Athena query with WHERE year=2025 AND month=01 scans only relevant partitions, reducing cost and improving performance.
Best practice: Use Parquet or ORC format for analytics (10x smaller than CSV, columnar compression). Partition by date or commonly-filtered dimensions. Use Glue Crawler to automatically catalog partitions.
Backup and Disaster Recovery
S3 provides durable, cost-effective storage for backups with flexible retrieval options.
Backup tier strategy:
| Backup Type | Access Frequency | Retention | Storage Class |
|---|---|---|---|
| Daily backups (recent) | Frequent (disaster recovery) | 7-30 days | Standard or Standard-IA |
| Weekly backups | Occasional (accidental deletion) | 90 days | Glacier Instant Retrieval |
| Monthly backups | Rare (compliance, audit) | 1-7 years | Glacier Flexible Retrieval |
| Annual archives | Almost never | 7-10 years | Glacier Deep Archive |
Lifecycle policy (automated backup tiering):
Days 0-30: Standard ($23/TB/month)
Days 30-90: Glacier Instant ($4/TB/month)
Days 90-365: Glacier Flexible ($3.60/TB/month)
Days 365+: Glacier Deep Archive ($0.99/TB/month)
Cost comparison (1 TB daily backup retained for 7 years):
- All Standard: 7 years × 1 TB × $23/month = $1,932
- Lifecycle policy: $253 total (87% savings)
S3 Batch Operations for restore: Restore thousands of objects from Glacier in parallel using S3 Batch Operations instead of individual API calls.
Best practice: Use AWS Backup for automated backup scheduling across AWS services (EC2, RDS, DynamoDB, EBS). AWS Backup stores backups in S3 with automatic lifecycle management.
Application Data Storage
S3 integrates with EC2, Lambda, and containers for application data storage.
Use cases:
- User uploads: Web/mobile apps upload files (images, videos, documents)
- Application assets: Static assets (CSS, JavaScript, fonts) served via CloudFront
- Data processing: Lambda processes files uploaded to S3 (image resizing, video transcoding, ETL)
- Configuration management: Store application configuration files, feature flags
Presigned URLs: Generate time-limited URLs for uploads/downloads without AWS credentials:
Upload flow:
- Client requests upload URL from application backend
- Backend generates presigned PUT URL (valid 15 minutes)
- Client uploads directly to S3 using presigned URL
- S3 triggers Lambda (via event notification) for post-processing
Benefits:
- Client uploads directly to S3 (doesn’t transit through application servers)
- No AWS credentials in client application
- Time-limited access (presigned URL expires after TTL)
Best practice: Use presigned URLs for all user uploads/downloads. Set short TTL (5-15 minutes). Validate file size, content type, and permissions on backend before generating presigned URL.
Observability
CloudWatch Metrics
S3 publishes metrics to CloudWatch for monitoring bucket and request activity.
Storage metrics (free, updated daily):
BucketSizeBytes: Total bucket size in bytesNumberOfObjects: Object count
Request metrics (requires opt-in, $0.01 per 1,000 metrics):
AllRequests,GetRequests,PutRequests,DeleteRequests: Request counts4xxErrors,5xxErrors: Error countsFirstByteLatency,TotalRequestLatency: Performance metricsBytesDownloaded,BytesUploaded: Data transfer
Best practice: Enable request metrics on production buckets to monitor request patterns, error rates, and latency. Set CloudWatch alarms for:
- 4xxErrors > 5% of requests: Indicates permissions issues or missing objects
- 5xxErrors > 1% of requests: S3 service issues (rare) or misconfiguration
- Sudden spike in DeleteRequests: Possible ransomware or accidental deletion
CloudTrail Logging
CloudTrail logs S3 API calls for audit and security analysis.
Management events (free):
- CreateBucket, DeleteBucket, PutBucketPolicy, PutBucketVersioning
Data events ($0.10 per 100,000 events):
- GetObject, PutObject, DeleteObject, CopyObject
Use cases:
- Security audit: Who accessed what object and when?
- Compliance: Demonstrate data access for regulatory requirements
- Incident response: Investigate unauthorized access or data exfiltration
Best practice: Enable CloudTrail data events for sensitive buckets. Send CloudTrail logs to dedicated S3 bucket with Object Lock for tamper-proof audit trail.
Example query (Athena on CloudTrail logs):
SELECT eventtime, useridentity.principalid, requestparameters
FROM cloudtrail_logs
WHERE eventname = 'DeleteObject'
AND requestparameters LIKE '%sensitive-file.txt%';
S3 Server Access Logs
Server Access Logs record all requests made to a bucket, delivered to a target S3 bucket.
What’s logged:
- Requester IP address
- Request time
- Request type (GET, PUT, DELETE)
- Object key
- HTTP status code
- Error code (if any)
Pricing: Free (pay only for storage of log files)
Log delivery delay: Up to a few hours
Use cases:
- Analyze access patterns (most frequently accessed objects)
- Troubleshoot client errors (identify 403/404 errors)
- Security analysis (detect unusual access patterns)
Best practice: Enable Server Access Logging on all buckets for long-term audit trail. Use separate bucket for logs with lifecycle policy to delete logs after 90 days. Use Athena to query logs for analysis.
Athena query (top 10 most accessed objects):
SELECT key, COUNT(*) as request_count
FROM s3_access_logs
WHERE httpstatus = 200
GROUP BY key
ORDER BY request_count DESC
LIMIT 10;
S3 Storage Lens
Storage Lens provides organization-wide visibility into storage usage, activity trends, and cost optimization opportunities.
Free tier metrics:
- Storage by bucket, region, storage class
- Object count and average object size
- Encryption and versioning status
- Incomplete multipart uploads
Advanced tier metrics ($0.20 per million objects analyzed):
- Request metrics (PUT/GET/DELETE counts)
- Data transfer metrics
- Lifecycle activity
- Replication metrics
- Prefix-level metrics
Use cases:
- Identify buckets without encryption or versioning
- Find buckets with abandoned multipart uploads (storage waste)
- Analyze storage growth trends
- Discover cost optimization opportunities (objects eligible for lifecycle transitions)
Best practice: Enable Storage Lens free tier for all accounts. Review dashboard monthly to identify security misconfigurations and cost optimization opportunities. Enable advanced tier for production accounts with >1 million objects.
When to Use S3 vs Alternatives
S3 vs EBS
| S3 | EBS |
|---|---|
| Object storage (files, not blocks) | Block storage (formatted as file system) |
| Accessed via HTTP API | Attached to EC2 instance (appears as /dev/xvdf) |
| Unlimited scalability | Limited to 64 TB per volume |
| 11 9s durability (multi-AZ) | Durability within single AZ (snapshots to S3 for multi-AZ) |
| $0.023/GB/month (Standard) | $0.08-$0.125/GB/month (gp3, io2) |
| Lower latency for large files | Lower latency for small random I/O |
Use S3 when:
- Storing files accessed by multiple applications/services
- Data shared across regions
- Cost-sensitive workloads (S3 3-5x cheaper than EBS)
- Data doesn’t require file system (databases, operating systems)
Use EBS when:
- Running databases requiring low-latency random I/O (MySQL, PostgreSQL, MongoDB)
- Operating system boot volumes
- Applications requiring file system semantics (POSIX)
- Single EC2 instance needs persistent storage
S3 vs EFS
| S3 | EFS |
|---|---|
| Object storage (API access) | Network file system (NFS) |
| Accessed via SDK, CLI, HTTP | Mounted to EC2/Lambda (appears as /mnt/efs) |
| Unlimited files per bucket | Unlimited files |
| $0.023/GB/month (Standard) | $0.30/GB/month (Standard), $0.043/GB/month (Infrequent Access) |
| Higher latency (10-100ms) | Lower latency (1-10ms) |
| No file system overhead | File system overhead |
Use S3 when:
- Data accessed by serverless applications (Lambda, Fargate)
- Data shared across regions
- Cost-sensitive workloads (S3 10x cheaper than EFS Standard)
- Object storage semantics acceptable (PUT/GET, no file locking)
Use EFS when:
- Multiple EC2 instances need shared file system
- Applications require POSIX file system semantics
- Legacy applications designed for NFS
- Low-latency access to shared data
Cost comparison (1 TB data, accessed monthly):
- S3 Standard-IA: $12.50/month + $10 retrieval = $22.50/month
- EFS Standard: $300/month
- EFS Infrequent Access: $43/month + $10 retrieval = $53/month
S3 vs FSx for Lustre
| S3 | FSx for Lustre |
|---|---|
| Object storage | High-performance parallel file system |
| 10-100ms latency | Sub-millisecond latency |
| Thousands of requests/second per prefix | Millions of IOPS, hundreds of GB/s throughput |
| $0.023/GB/month | $0.145-$1.000/GB/month (depends on deployment type) |
Use S3 when:
- Data doesn’t require high-performance file system
- Cost-sensitive workloads
- Data stored long-term (months/years)
Use FSx for Lustre when:
- High-performance computing (HPC) workloads
- Machine learning training requiring fast data access
- Genomics, financial modeling, video rendering
- Sub-millisecond latency required
Best practice: Use FSx for Lustre linked to S3. FSx imports data from S3, provides high-performance access during processing, then exports results back to S3. Pay for FSx only during processing (hours/days), store long-term in S3.
Common Pitfalls
1. Enabling Versioning Without Lifecycle Policy
Problem: Every object modification creates a new version, consuming storage indefinitely. 1 GB file modified 100 times = 100 GB storage (100 versions × 1 GB each).
Cost impact: Storage costs grow without limit. A bucket with frequent updates can accumulate thousands of versions, increasing costs 10-100x.
Solution: Always configure lifecycle policy to delete noncurrent versions:
{
"Rules": [
{
"Id": "DeleteOldVersions",
"Status": "Enabled",
"NoncurrentVersionExpiration": {
"NoncurrentDays": 30
}
}
]
}
Best practice: Set noncurrent version expiration to 30-90 days depending on recovery requirements. Use S3 Storage Lens to identify buckets with excessive versions.
2. Transitioning Small Objects to Glacier
Problem: Each object in Glacier incurs 40 KB overhead for metadata. Transitioning small objects to Glacier may cost more than keeping them in Standard.
Cost comparison (1,000 objects, 10 KB each):
- Standard: 10 MB × $0.023/GB = $0.00023/month
- Glacier Flexible: (10 MB + 40 MB overhead) × $0.0036/GB = $0.00018/month
The 40 KB overhead makes Glacier more expensive for small files.
Solution: Only transition objects >128 KB to Glacier. Use lifecycle filter to exclude small objects:
{
"Rules": [
{
"Id": "TransitionLargeObjects",
"Filter": {
"ObjectSizeGreaterThan": 131072
},
"Transitions": [
{
"Days": 90,
"StorageClass": "GLACIER"
}
]
}
]
}
Best practice: For small files, consider compressing/archiving into larger files (TAR, ZIP) before transitioning to Glacier.
3. Not Using Multipart Upload for Large Files
Problem: Uploading files >100 MB as single PUT request is slow, fails easily, and cannot be resumed.
Consequences:
- Upload failure requires re-uploading entire file
- No parallel upload (slower total time)
- Network interruptions cause complete failure
Solution: Use multipart upload (AWS SDK handles automatically):
- Files >100 MB: Recommended
- Files >5 GB: Required (single PUT limit)
AWS CLI configuration (~/.aws/config):
[default]
s3 =
multipart_threshold = 100MB
multipart_chunksize = 50MB
Best practice: Configure SDK/CLI to use multipart upload for files >100 MB. Enable lifecycle policy to delete incomplete multipart uploads after 7 days (abandoned uploads consume storage).
4. Using ListObjects Instead of S3 Inventory
Problem: LIST operations cost $0.005 per 1,000 requests. Listing 1 million objects 100 times/day costs $500/day ($15,000/month).
Solution: Use S3 Inventory for recurring object lists. Inventory generates daily/weekly reports for $0.0025 per million objects ($0.0025/month for 1 million objects).
Cost comparison (list 1 million objects daily for 30 days):
- LIST API: 30 days × 1,000 requests × $0.005 = $150/month
- S3 Inventory: $0.0025/month (99.998% savings)
Best practice: Use S3 Inventory for recurring object lists. Use Athena to query inventory reports. Reserve LIST API for small, one-time queries.
5. Not Using VPC Endpoint for Internal S3 Access
Problem: EC2 instances accessing S3 without VPC endpoint route traffic through NAT Gateway, incurring $0.045/GB data processing charges.
Cost impact (1 TB/month transfer from EC2 to S3):
- Without VPC endpoint: $45/month (NAT Gateway)
- With VPC endpoint: $0/month
Solution: Create S3 Gateway Endpoint in VPC (free):
- VPC → Endpoints → Create Endpoint
- Service:
com.amazonaws.region.s3 - Type: Gateway
- Select route tables
- Create endpoint
Best practice: Create S3 VPC endpoint in every VPC that accesses S3. Restrict sensitive buckets to specific VPC endpoints using bucket policies.
6. Ignoring S3 Request Costs for High-Volume Workloads
Problem: PUT/GET requests cost $0.005 per 1,000 (PUT) and $0.0004 per 1,000 (GET). For workloads with billions of requests, request costs exceed storage costs.
Example (1 billion GET requests per month):
- Storage: 1 TB × $0.023 = $23/month
- GET requests: 1 billion × $0.0004 / 1,000 = $400/month
- Request costs are 17x storage costs
Solutions:
- Batch small files: Combine 1,000 files into one archive (1,000x fewer requests)
- Use CloudFront: Caching reduces S3 GET requests by 80-95%
- Use S3 Express One Zone: 50% lower request costs for high-volume workloads
Best practice: For high-request workloads (>1 million requests/day), evaluate CloudFront caching or S3 Express One Zone.
7. Not Configuring Lifecycle Policy for Multipart Uploads
Problem: Abandoned multipart uploads consume storage indefinitely. If client crashes during upload, parts remain in S3 and accumulate costs.
Cost impact: Organizations with frequent large uploads may have hundreds of GB in abandoned parts.
Solution: Configure lifecycle policy to abort incomplete uploads after 7 days:
{
"Rules": [
{
"Id": "AbortIncompleteUploads",
"Status": "Enabled",
"AbortIncompleteMultipartUpload": {
"DaysAfterInitiation": 7
}
}
]
}
Best practice: Apply this policy to all buckets receiving large uploads. Use S3 Storage Lens to identify buckets with incomplete multipart uploads.
8. Public Bucket Misconfiguration
Problem: Accidentally making bucket public exposes sensitive data. Despite Block Public Access default (2023), bucket policies or ACLs can override if Block Public Access is disabled.
Common mistakes:
- Disabling Block Public Access account-wide to fix one bucket
- Using
"Principal": "*"in bucket policy without conditions - Granting public-read ACL during upload
Solution:
- Keep Block Public Access enabled at account level
- Selectively disable at bucket level for public buckets (static websites, public datasets)
- Use IAM Access Analyzer to detect public buckets
Best practice: Use IAM Access Analyzer to continuously monitor for public buckets. Set up EventBridge alert when bucket policy grants public access.
9. Not Using Intelligent-Tiering for Unknown Access Patterns
Problem: Organizations manually create lifecycle policies based on assumptions about access patterns. If assumptions are wrong, data is transitioned to cold storage and incurs retrieval fees.
Example:
- Assumption: Data not accessed after 30 days
- Reality: Data accessed quarterly
- Result: Frequent retrievals from Glacier ($0.03/GB) cost more than keeping in Standard
Solution: Use Intelligent-Tiering when access patterns are unknown. No retrieval fees, automatic optimization based on actual access.
Best practice: Set Intelligent-Tiering as default storage class for new buckets. After 6-12 months, analyze access patterns using S3 Storage Class Analysis and create manual lifecycle policy if appropriate.
10. Not Encrypting Data at Rest
Problem: Although S3 enables encryption by default (2023), bucket policies can allow unencrypted uploads. Compliance requirements may mandate specific encryption (SSE-KMS).
Solution: Enforce encryption with bucket policy:
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::bucket-name/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": ["AES256", "aws:kms"]
}
}
}
Best practice: Use bucket policy to enforce encryption. For compliance workloads, use SSE-KMS and enable CloudTrail logging to audit key usage.
11. Overlooking Cross-Region Data Transfer Costs
Problem: Cross-Region Replication and data retrieval from different regions incur data transfer charges ($0.02/GB).
Cost impact (1 TB replicated monthly with CRR):
- Data transfer: 1 TB × $0.02 = $20/month ($240/year)
- Over 3 years: $720 in transfer costs alone
Solution: Use Same-Region Replication (SRR) when cross-region replication is not required. For disaster recovery, evaluate whether continuous replication is necessary or whether periodic snapshots suffice.
Best practice: Evaluate the business requirement for cross-region replication. If RTO allows hours, consider S3 Batch Replication on-demand instead of continuous CRR.
12. Not Monitoring Replication Lag
Problem: Replication lag can delay disaster recovery. Without monitoring, you may not discover replication issues until failover is required.
Solution: Enable S3 Replication Time Control (RTC) for SLA-backed replication within 15 minutes. Monitor replication metrics in CloudWatch:
ReplicationLatency: Time to replicate objectsBytesPendingReplication: Bytes waiting to be replicated
Best practice: Set CloudWatch alarm for BytesPendingReplication > 1 GB or ReplicationLatency > 15 minutes. Use RTC for critical data requiring fast recovery.
13. Retrieving Large Amounts of Data from Glacier Without Planning
Problem: Glacier retrieval takes hours (Flexible) to days (Deep Archive). Retrieving 10 TB from Deep Archive takes 48 hours and costs $200 (Standard retrieval).
Solution: Plan Glacier restores in advance. Use Bulk retrieval (12 hours, $0.0025/GB) instead of Standard (48 hours, $0.02/GB) when time permits.
Cost comparison (restore 10 TB from Glacier Deep Archive):
- Standard retrieval: 10 TB × $0.02/GB = $200, 48 hours
- Bulk retrieval: 10 TB × $0.0025/GB = $25, 12 hours (87.5% savings)
Best practice: For disaster recovery, store recent backups in Glacier Instant Retrieval (millisecond access) and older backups in Glacier Flexible/Deep Archive. Test restore procedures quarterly.
14. Not Using S3 Batch Operations for Bulk Changes
Problem: Modifying millions of objects (change storage class, copy, encrypt) with individual API calls is slow, expensive, and error-prone.
Solution: Use S3 Batch Operations to perform bulk operations on millions/billions of objects:
- Change storage class for all objects in bucket
- Copy objects to another bucket
- Encrypt unencrypted objects
- Restore objects from Glacier
Pricing: $0.25 per million objects processed + standard S3 request costs
Best practice: Use S3 Batch Operations for one-time bulk changes. Use lifecycle policies for ongoing automated transitions.
15. Not Setting Appropriate Bucket Lifecycle Policies
Problem: Default S3 configuration stores all data in Standard storage class indefinitely. Without lifecycle policies, costs grow linearly with data volume.
Cost impact (10 TB data stored 5 years, never accessed after 90 days):
- No lifecycle policy: 5 years × 10 TB × $23/month = $13,800
- With lifecycle (transition to Glacier Flexible after 90 days): $1,350 (90% savings)
Solution: Implement lifecycle policies for every bucket based on data access patterns. Start with conservative policies and adjust based on actual access patterns.
Best practice: Create organization-wide default lifecycle policies for new buckets. Use S3 Storage Class Analysis to identify lifecycle policy opportunities.
Key Takeaways
-
Storage class selection is critical for cost optimization: Properly configuring storage classes can reduce costs by 50-96%. Use Intelligent-Tiering when access patterns are unknown, and lifecycle policies to automate transitions.
-
S3 provides 11 9s durability by default: Data is automatically replicated across ≥3 Availability Zones. One Zone classes sacrifice multi-AZ durability for lower cost; use only for reproducible data or with cross-region replication.
-
Security is layered: Use Block Public Access (enabled by default), bucket policies, IAM policies, encryption at rest (SSE-S3 default), and encryption in transit (HTTPS). VPC endpoints keep traffic private and reduce costs.
-
Performance scales with prefixes: S3 supports 3,500 PUT/s and 5,500 GET/s per prefix. Distribute objects across multiple prefixes for high-throughput workloads. Use multipart upload for files >100 MB.
-
Versioning requires lifecycle management: Enabling versioning without lifecycle policies causes storage costs to grow indefinitely. Always configure noncurrent version expiration (30-90 days).
-
Lifecycle policies provide automated cost optimization: Transition objects between storage classes based on age. Only transition objects >128 KB to Glacier (40 KB overhead makes small objects expensive).
-
Request costs matter for high-volume workloads: For billions of requests, request costs exceed storage costs. Use CloudFront caching, batch small files, or S3 Express One Zone for high-request workloads.
-
Replication provides disaster recovery and compliance: Cross-Region Replication (CRR) replicates data to different regions. Use Replication Time Control (RTC) for SLA-backed replication within 15 minutes. Monitor replication lag with CloudWatch.
-
S3 integrates seamlessly with AWS services: S3 Event Notifications trigger Lambda, SQS, SNS when objects are created/deleted. S3 serves as storage layer for data lakes (Athena, Glue), backups (AWS Backup), and static websites (CloudFront).
-
Observability is built-in: Use CloudWatch metrics for request patterns, CloudTrail for audit trails, Server Access Logs for detailed request logs, and S3 Storage Lens for organization-wide visibility. Enable logging on all buckets.
-
VPC endpoints eliminate data transfer costs: Use S3 Gateway Endpoint (free) to route EC2-to-S3 traffic privately, eliminating NAT Gateway costs ($0.045/GB). Restrict sensitive buckets to specific VPC endpoints.
-
Multipart upload is required for large files: Files >5 GB must use multipart upload. Files >100 MB should use multipart upload for performance and resilience. Configure lifecycle policy to delete incomplete uploads after 7 days.
-
S3 is the default choice for object storage: Use S3 unless you need file system semantics (EFS) or high-performance parallel file system (FSx for Lustre). S3 is 3-10x cheaper than alternatives for object storage workloads.
-
Intelligent-Tiering eliminates guesswork: When access patterns are unknown or changing, use Intelligent-Tiering instead of manual lifecycle policies. No retrieval fees, automatic optimization based on actual access patterns.
-
Test disaster recovery procedures: Enable Cross-Region Replication for critical data. Test restore procedures from Glacier quarterly. Use S3 Batch Operations for bulk restores. Monitor replication lag with CloudWatch alarms.
Found this guide helpful? Share it with your team:
Share on LinkedIn