IaC State Management

Infrastructure & Cloud

Table of Contents

  1. What is State
  2. Do You Need State Management?
  3. Local vs. Remote State
  4. State Locking
  5. State Management Best Practices
  6. Common State Operations

What is State

State is the record of infrastructure resources currently deployed, tracked by IaC tools to:

  • Map configuration to real resources
  • Track resource metadata
  • Determine what changes are needed
  • Manage dependencies between resources

Why State Matters

Without state, IaC tools cannot:

  • Know what infrastructure currently exists
  • Determine what needs to be created, updated, or deleted
  • Track resource relationships
  • Detect configuration drift

State contains:

  • Resource IDs
  • Resource attributes
  • Dependencies between resources
  • Provider configurations
  • Sensitive data (passwords, private keys)

Do You Need State Management?

Cloud-native IaC tools handle state automatically - you don’t need to manage state yourself.

Tools That Manage State for You

AWS CloudFormation:

  • AWS manages all state internally
  • No state files to secure or back up
  • No risk of state corruption
  • No locking concerns
  • Built-in drift detection

Azure Resource Manager (ARM Templates / Bicep):

  • Azure manages deployment state
  • Integrated with Azure portal
  • Deployment history tracked automatically

Google Cloud Deployment Manager:

  • GCP tracks deployment state
  • Managed through GCP console

When State Management Is Your Responsibility

Terraform and Pulumi require you to manage state explicitly (unless using Terraform Cloud or Pulumi Cloud):

  • You must configure remote backends
  • You must implement locking mechanisms
  • You must secure sensitive data in state
  • You must handle backup and recovery
  • You must prevent state corruption

Should You Accept This Complexity?

Only add user-managed state when you can articulate specific reasons:

Valid reasons:

  • Multi-cloud requirements (managing AWS + Azure + GCP together)
  • Need for Terraform/Pulumi-specific features not available in cloud-native tools
  • Existing infrastructure already managed by Terraform/Pulumi
  • Organizational standard requires specific tooling

Not valid reasons:

  • “Terraform is popular”
  • “We know Terraform already” (unless you have multi-cloud needs)
  • “Terraform is industry standard” (CloudFormation is the standard for AWS-only)

The cost of user-managed state:

  • Risk of state corruption causing infrastructure issues
  • Complexity of securing state files containing sensitive data
  • Operational overhead of managing remote backends and locking
  • Recovery procedures when state issues occur
  • Additional infrastructure to support state management

If you’re deploying only to AWS, CloudFormation eliminates all of these risks.


Local vs. Remote State

Local State

What it is: State stored on local filesystem.

terraform.tfstate

Advantages:

  • Simple to get started
  • No additional setup required
  • Fast access

Disadvantages:

  • ❌ Not suitable for teams (no sharing)
  • ❌ No locking (concurrent changes dangerous)
  • ❌ Risk of loss/corruption
  • ❌ No backup/versioning
  • ❌ Contains sensitive data locally

When to use:

  • Learning and experimentation only
  • Single developer, non-critical infrastructure
  • Never for production

Remote State

What it is: State stored in a remote backend (S3, Azure Blob, GCS, Terraform Cloud).

Terraform S3 Backend:

terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

Advantages:

  • ✅ Shared access for teams
  • ✅ State locking (prevents conflicts)
  • ✅ Encryption at rest
  • ✅ Versioning and backup
  • ✅ Audit logging
  • ✅ Centralized management

Disadvantages:

  • Requires setup
  • Depends on external service
  • Potential costs

When to use:

  • All team environments
  • Production infrastructure
  • Any collaborative work

AWS S3 + DynamoDB:

terraform {
  backend "s3" {
    bucket         = "terraform-state"
    key            = "path/to/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"  # For locking
    encrypt        = true
    kms_key_id     = "arn:aws:kms:..."  # Optional KMS encryption
  }
}

Azure Blob Storage:

terraform {
  backend "azurerm" {
    resource_group_name  = "terraform-state-rg"
    storage_account_name = "terraformstate"
    container_name       = "tfstate"
    key                  = "prod.terraform.tfstate"
  }
}

Google Cloud Storage:

terraform {
  backend "gcs" {
    bucket = "terraform-state"
    prefix = "prod"
  }
}

Terraform Cloud:

terraform {
  cloud {
    organization = "my-org"
    workspaces {
      name = "production"
    }
  }
}

State Locking

What it is: Preventing simultaneous state modifications that could corrupt state.

How Locking Works

  1. Process acquires lock before modifying state
  2. Lock prevents other processes from modifying state
  3. Lock released after operation completes or fails

Implementations

DynamoDB (AWS + S3 backend):

terraform {
  backend "s3" {
    bucket         = "terraform-state"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"  # Locking table
  }
}

DynamoDB table structure:

resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

Consul:

terraform {
  backend "consul" {
    address = "consul.example.com"
    path    = "terraform/prod"
    lock    = true
  }
}

Force Unlock (Use Carefully)

Most IaC tools provide a force-unlock operation for stuck locks. Use with extreme caution.

When to force unlock:

  • Process crashed and left lock orphaned
  • Lock is demonstrably stale (check lock timestamp)
  • You’ve confirmed no one else is running operations

Never force unlock if:

  • Someone else might be running operations
  • Uncertain about lock state
  • During normal business hours without team communication

State Management Best Practices

1. Always Use Remote State for Teams

Never rely on local state files for production or team environments.

# ❌ BAD: No backend configured
terraform {
  # Uses local state
}

# ✅ GOOD: Remote backend
terraform {
  backend "s3" {
    bucket = "terraform-state"
    key    = "prod/terraform.tfstate"
    region = "us-east-1"
    encrypt = true
    dynamodb_table = "terraform-locks"
  }
}

2. Secure State Files

State contains sensitive data!

Enable encryption:

terraform {
  backend "s3" {
    bucket  = "terraform-state"
    key     = "terraform.tfstate"
    encrypt = true  # Enable encryption at rest
    kms_key_id = "arn:aws:kms:..."  # Use KMS for extra security
  }
}

Restrict access:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::terraform-state/prod/*",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:role/TerraformRole"
      }
    }
  ]
}

Never commit state to Git:

# .gitignore
*.tfstate
*.tfstate.*

3. Backup State

Enable versioning:

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  versioning_configuration {
    status = "Enabled"
  }
}

Regular backups:

  • S3 versioning (automatic)
  • Cross-region replication
  • Periodic manual backups
  • Test restore procedures

4. Separate State by Environment

Don’t share state across environments:

s3://terraform-state/
├── dev/terraform.tfstate
├── staging/terraform.tfstate
└── production/terraform.tfstate

Or use separate buckets:

s3://terraform-state-dev/terraform.tfstate
s3://terraform-state-staging/terraform.tfstate
s3://terraform-state-prod/terraform.tfstate

5. Review Plans Before Applying

Always preview changes before applying them:

  • Generate a plan showing what will change
  • Review the plan carefully for unexpected changes
  • Save the plan and apply exactly what was reviewed
  • Never skip the preview step, especially in production

6. State File Security Checklist

  • Remote backend configured
  • Encryption at rest enabled
  • State locking enabled
  • Versioning enabled
  • Access restricted via IAM
  • State files not in Git
  • Regular backups tested
  • Separate state per environment

Common State Operations

IaC tools provide commands for viewing and managing state. Consult your tool’s documentation for current syntax:

Viewing State

You can:

  • List all resources tracked in state
  • Show details of specific resources
  • View output values
  • Inspect resource metadata and dependencies

Modifying State

Move operations:

  • Rename resources in state (update reference without recreating)
  • Move resources between modules
  • Reorganize infrastructure code without destroying resources

Remove operations:

  • Remove resources from state tracking (doesn’t delete the actual resource)
  • Useful when manually deleting resources or transferring ownership

Import operations:

  • Import existing infrastructure into state management
  • Add resources created outside IaC to your state
  • Essential for brownfield infrastructure adoption

Recovering from State Issues

Backup and restore:

  • Download current state as backup before risky operations
  • Restore from backup if state becomes corrupted
  • Use versioning features (S3 versioning) for automatic backups

State recovery process:

  1. Download backup from remote backend
  2. Verify backup integrity
  3. Restore backup to remote backend
  4. Validate infrastructure matches restored state

Migrating State

Backend migration:

  • Change state storage location (local → remote, or remote → different remote)
  • Tool-specific migration commands handle data transfer
  • Always backup state before migration
  • Verify state after migration completes

Found this guide helpful? Share it with your team:

Share on LinkedIn