Back to Notes
December 2025

The IaC Testing Pyramid: Catching Infrastructure Bugs Before Production

A practical framework for testing Terraform and Bicep at every level

Infrastructure as Code changed how we deploy. But it didn't automatically change how we test.

Most teams treat IaC testing as an afterthought, if they test at all. The "testing strategy" is to run terraform apply in a dev environment and eyeball the results. Maybe someone reviews the PR. Maybe.

Then something breaks in production. A security group rule that was supposed to block traffic doesn't. A storage account is publicly accessible when it shouldn't be. A naming convention is wrong, and the monitoring dashboards can't find the resource. The kind of bugs that are obvious in hindsight but invisible in a code review.

The fix isn't more careful code reviews. It's a testing strategy that catches different types of errors at different stages. The same pyramid approach that works for application code, adapted for infrastructure.

The conceptual framework still applies

Application testing has mature patterns. Unit tests check individual functions. Integration tests check component interactions. End-to-end tests check user flows. Developers learn this on day one.

Infrastructure testing is murkier. What's a "unit" of infrastructure? How do you test a VNet without deploying it? What does "mocking" mean when your code talks directly to a cloud API?

The good news: the conceptual framework still applies. You want fast, cheap tests at the bottom catching simple errors, and slower, more realistic tests at the top catching integration issues. The specifics are just different.

Fast tests at the bottom, thorough tests at the top

Six layers of IaC testing: Unit tests catch syntax and security issues, Contract tests catch API mismatches, Integration tests verify module deployment, Compliance tests check governance requirements, Platform tests validate ServiceNow behavior, E2E tests confirm full system flow

Each level catches different problems:

LevelWhat It CatchesSpeedCost
Static AnalysisSyntax errors, formatting, basic misconfigurationsSecondsFree
Plan ValidationInvalid references, type mismatches, provider errorsMinutesFree
Deployment TestsResources that don't actually work when deployed10-30 minCloud costs
Policy/ComplianceSecurity violations, governance failuresSeconds-minutesFree

Static analysis catches the obvious stuff in seconds

This is the base of the pyramid. Fast, cheap, catches the obvious stuff.

For Terraform:

# Syntax validation
terraform validate

# Formatting check
terraform fmt -check

# Linting for best practices
tflint

# Security scanning
tfsec
checkov -d .

For Bicep:

# Syntax validation (built into build)
az bicep build --file main.bicep

# Linting
az bicep lint --file main.bicep

# Security scanning
checkov -d . --framework bicep

What these catch:

  • • Syntax errors (missing brackets, typos)
  • • Invalid resource types or property names
  • • Deprecated features
  • • Obvious security issues (public IPs where they shouldn't be, missing encryption flags)
  • • Formatting inconsistencies
  • • Hardcoded secrets

When to run: On every commit, in pre-commit hooks, and as the first step in CI.

These tests run in seconds. There's no reason not to run them constantly.

Plan validation catches what static analysis misses

Static analysis checks the code in isolation. Plan validation checks what the code would do if applied.

For Terraform:

terraform init
terraform plan -out=tfplan

# Convert to JSON for programmatic checks
terraform show -json tfplan > plan.json

For Bicep:

az deployment group what-if \
  --resource-group my-rg \
  --template-file main.bicep \
  --parameters @params.json

What plan validation catches:

  • • Resources that reference non-existent dependencies
  • • Type mismatches (string where number expected)
  • • Provider authentication issues
  • • Quota limitations (before you hit them during apply)
  • • Drift detection (what's different from current state)

Programmatic plan testing:

The plan output is JSON. You can write assertions against it:

import json

with open('plan.json') as f:
    plan = json.load(f)

# Check that all VMs have encryption enabled
for resource in plan['resource_changes']:
    if resource['type'] == 'azurerm_linux_virtual_machine':
        config = resource['change']['after']
        assert config.get('encryption_at_host_enabled') == True, \
            f"VM {resource['address']} missing encryption"

Tools like terraform-compliance, conftest, and OPA (Open Policy Agent) let you write these checks declaratively:

# policy/vm.rego
deny[msg] {
    resource := input.resource_changes[_]
    resource.type == "azurerm_linux_virtual_machine"
    not resource.change.after.encryption_at_host_enabled
    msg := sprintf("VM %v must have encryption enabled", [resource.address])
}

When to run: On every PR, as a gate before merge.

Plan validation takes a few minutes but doesn't cost anything. No resources are actually created.

Some bugs only appear when resources actually exist

Some bugs only appear when resources actually exist. The storage account deploys, but the firewall rules don't work. The VM starts, but the startup script fails. The database is created, but the connection string is wrong.

Deployment tests actually create infrastructure, run validations against it, then tear it down.

Frameworks:

  • Terratest (Go): The most mature option. Create resources, run checks, destroy.
  • Kitchen-Terraform: Ruby-based, integrates with InSpec for compliance checks.
  • pytest + subprocess: Roll your own with Python.

Example Terratest pattern:

func TestVmDeployment(t *testing.T) {
    terraformOptions := &terraform.Options{
        TerraformDir: "../examples/vm",
        Vars: map[string]interface{}{
            "environment": "test",
        },
    }

    // Deploy
    defer terraform.Destroy(t, terraformOptions)
    terraform.InitAndApply(t, terraformOptions)

    // Validate
    vmIP := terraform.Output(t, terraformOptions, "vm_private_ip")

    // Check SSH connectivity
    host := ssh.Host{
        Hostname: vmIP,
        SshKeyPair: loadKeyPair(t),
    }
    ssh.CheckSshConnection(t, host)

    // Check that expected software is installed
    output := ssh.CheckSshCommand(t, host, "python3 --version")
    assert.Contains(t, output, "Python 3")
}

What deployment tests catch:

  • • Network connectivity issues
  • • IAM/RBAC misconfigurations (can the app actually access the database?)
  • • Startup script failures
  • • Resource dependencies that don't work in practice
  • • Cloud provider quirks that aren't documented

When to run: On PRs to main (or nightly). Not on every commit. Too slow and expensive.

Cost management: Use small resource sizes. Set up budget alerts on test subscriptions. Auto-destroy after a timeout in case tests fail mid-run.

Policy tests enforce what the organization requires

This level checks whether your infrastructure meets organizational and regulatory requirements. It can run at multiple stages: pre-deployment (against plans) or post-deployment (against live resources).

Tools:

  • Azure Policy: Built into Azure. Deny non-compliant resources or audit them.
  • Open Policy Agent (OPA): General-purpose policy engine. Write policies in Rego.
  • Checkov: Scans IaC for compliance with CIS benchmarks, SOC2, HIPAA, etc.
  • Sentinel (HashiCorp): Policy as code for Terraform Cloud/Enterprise.

Example Azure Policy (deny public storage accounts):

{
  "if": {
    "allOf": [
      {
        "field": "type",
        "equals": "Microsoft.Storage/storageAccounts"
      },
      {
        "field": "Microsoft.Storage/storageAccounts/publicNetworkAccess",
        "equals": "Enabled"
      }
    ]
  },
  "then": {
    "effect": "deny"
  }
}

Example OPA policy (require tags):

required_tags := ["CostCenter", "Owner", "Environment"]

deny[msg] {
    resource := input.resource_changes[_]
    resource.change.actions[_] == "create"
    tags := object.get(resource.change.after, "tags", {})
    missing := required_tags - object.keys(tags)
    count(missing) > 0
    msg := sprintf("Resource %v missing required tags: %v", [resource.address, missing])
}

When to run:

  • Pre-deployment (against plan): In CI, before apply. Fast, catches issues early.
  • Post-deployment (against live resources): Continuously. Catches drift and manual changes.

Azure Policy is particularly powerful because it can prevent non-compliant resources from being created, even if someone bypasses your CI/CD pipeline and deploys manually.

"We don't have time to write tests"

"We don't have time to write tests."

Start with Level 1. Adding terraform validate, fmt -check, and tfsec to your pipeline takes 15 minutes and catches real bugs. You can add higher levels incrementally.

"Deployment tests are too slow."

They are. That's why they're at the top of the pyramid, not the bottom. Run them nightly or on PRs to main, not on every commit. Use small resource sizes.

"Our IaC is too complex to test."

Complex IaC is exactly what needs testing. If you can't test it, you can't safely change it. Consider breaking it into smaller, testable modules.

"Azure Policy catches everything anyway."

Azure Policy is your last line of defense, not your only line. Catching issues at plan time is faster and less disruptive than having deployments fail because Policy blocked them.

What to run and when

LevelToolsCatchesRuns
Staticvalidate, fmt, tflint, tfsecSyntax, formatting, obvious issuesEvery commit
Planplan + OPA/conftestInvalid references, type errors, policy violationsEvery PR
DeployTerratest, Kitchen-TerraformReal-world failures, connectivity, IAMMerge/nightly
PolicyAzure Policy, continuous scansCompliance, drift, governanceContinuously

The bottom line

Fast tests at the bottom. Thorough tests at the top. Compliance everywhere.

Stop hoping your IaC works. Prove it.