A Cloud Tagging Strategy That Actually Works
How to design, enforce, and maintain tags that don't decay into chaos
Everyone agrees tagging is important. Almost no one does it well.
The pattern is familiar: someone defines a tagging standard, sends an email, maybe creates a wiki page. For a few months, new resources get tagged. Then a project launches under deadline pressure and skips the tags. Then another. Six months later, 40% of resources are untagged, 30% have inconsistent values, and the remaining 30% use a schema that nobody remembers agreeing to.
Finance can't allocate costs. Security can't identify owners. Platform teams can't enforce lifecycle policies. The tagging standard exists on paper but not in practice.
The fix isn't more documentation or more reminders. It's enforcement at every layer. Untagged resources can't exist in the first place.
Cloud Tagging Strategy: Schema + Enforcement
Defense in depth: enforce at every layer so untagged resources can't exist
🏷️ Mandatory Tags (6)
⚠️ Controlled Vocabularies
✓ Design Principles
Minimal mandatory tags: Six is enough. More creates friction.
Enforce at every layer: Defense in depth, not single point.
Validate values, not just presence: Dropdowns beat freetext.
Connect to cost reports: Tags without visibility decay.
Tags don't maintain themselves. Systems that require them do.
If untagged resources can exist, they will exist.
Tagging standards exist on paper, not in practice
Tags decay because they're optional. Every system that allows "I'll add tags later" eventually ends up with untagged resources.
The decay happens through predictable paths:
Manual deployments bypass automation. Someone uses the portal to create a quick test resource. No tags.
Emergency changes skip process. Production is down, someone provisions a fix directly. No tags.
Templates copy without customization. A team copies another team's Terraform and changes the resource names but not the tag values.
No one owns tag quality. Everyone assumes someone else is checking.
Values drift without validation. The "CostCenter" tag allows freetext, so you end up with "12345", "CC-12345", "cost center 12345", and "finance".
The only way to prevent decay is to make tagging mandatory and validated at creation time, not as a cleanup task afterward.
Define the purpose before the schema
Before defining which tags to require, clarify what you're trying to accomplish. Tags serve different purposes:
| Purpose | Questions Tags Answer |
|---|---|
| Cost allocation | Who pays for this? Which budget? Which project? |
| Ownership | Who do I contact about this resource? Who's accountable? |
| Operations | Is this production or dev? Can it be shut down at night? When was it created? |
| Compliance | Does this handle sensitive data? Which regulations apply? |
| Automation | Should backup run on this? Should patching apply? |
Different organizations weight these differently. A healthcare company cares deeply about compliance tags. A startup might only care about cost allocation. A mature enterprise needs all of them.
Define your purposes first, then derive the tags that serve them.
Six tags are enough
Here's a schema that works for most organizations. Start with these six, then add more only when you have a clear use case.
| Tag | Purpose | Example Values |
|---|---|---|
| CostCenter | Financial allocation | 12345, GRANT-2024-0042 |
| Owner | Accountability (email) | jsmith@company.com |
| Department | Organizational ownership | Research, IT, Finance |
| Environment | Lifecycle stage | Production, Staging, Development |
| Project | Business initiative | DataPlatform, WebsiteRedesign |
| Application | System or workload | PatientPortal, DataWarehouse |
Why These Six?
CostCenter: Every dollar needs to trace to a budget. This is non-negotiable for any organization that does chargebacks or showbacks.
Owner: When something breaks at 2am, who gets paged? When a vulnerability is found, who remediates? An email address is better than a team name because teams change, email aliases can be updated centrally.
Department: Higher-level organizational grouping. Useful when you need to aggregate across projects or report to leadership.
Environment: Production resources get different treatment than dev resources (backup policies, change windows, uptime SLAs). This tag enables that differentiation.
Project: Business-level grouping. One project might span multiple applications. Useful for tracking initiative-level spending.
Application: Technical grouping. What system is this part of? Useful for operations, monitoring dashboards, and incident response.
What About More Tags?
Resist the urge to require 15 tags. Every additional required tag increases friction and reduces compliance. If a tag isn't actively used for cost allocation, automation, or reporting, it shouldn't be mandatory.
Optional tags are fine for teams that want more granularity. But mandatory tags should be the minimum set that serves real operational needs.
Enforcement at every layer, or none at all
A tag requirement that isn't enforced isn't a requirement. Each layer is a safety net for the ones above it.
Layer 1: Request Time (ServiceNow)
The service catalog form collects tag values as part of the request. The user can't submit without filling them in.
Implementation:
- Required fields for each mandatory tag
- Dropdowns or reference lookups for controlled vocabularies (Department, Environment)
- Freetext with validation patterns for structured values (email format for Owner)
- Cost center lookup against finance system to prevent typos
What this catches: Users who would otherwise deploy without thinking about tags.
Layer 2: IaC Templates
Every Terraform module and Bicep template requires tag values as input variables. Tags aren't optional parameters. They're mandatory inputs that cause the template to fail if missing.
Terraform example:
variable "required_tags" {
type = object({
CostCenter = string
Owner = string
Department = string
Environment = string
Project = string
Application = string
})
}
resource "azurerm_resource_group" "main" {
name = var.resource_group_name
location = var.location
tags = var.required_tags
}What this catches: Developers who write IaC without including tags.
Bonus: You can merge required tags with resource-specific tags:
locals {
common_tags = var.required_tags
resource_tags = merge(local.common_tags, {
ResourceType = "VirtualMachine"
CreatedBy = "Terraform"
})
}Layer 3: Pipeline Validation
Before terraform apply runs, a validation step checks the plan output for required tags.
Implementation with OPA/Conftest:
# policy/tags.rego
package main
required_tags := ["CostCenter", "Owner", "Department", "Environment", "Project", "Application"]
deny[msg] {
resource := input.resource_changes[_]
resource.change.actions[_] == "create"
tags := object.get(resource.change.after, "tags", {})
missing := required_tags - object.keys(tags)
count(missing) > 0
msg := sprintf("Resource %v missing required tags: %v", [resource.address, missing])
}Pipeline step:
- script: |
terraform show -json tfplan > tfplan.json
conftest test tfplan.json --policy policy/
displayName: 'Validate Tags'What this catches: IaC that somehow has variables defined but not populated correctly.
Layer 4: Azure Policy
The final safety net. Azure Policy evaluates every resource at creation time and can deny resources that don't comply.
Example policy (deny if CostCenter tag missing):
{
"if": {
"field": "[concat('tags[', 'CostCenter', ']')]",
"exists": "false"
},
"then": {
"effect": "deny"
}
}Apply to the management group level so it covers all subscriptions, including ones provisioned outside your normal pipelines.
What this catches: Manual portal deployments, CLI commands, emergency changes. Anything that bypasses Layers 1-3.
Why All Four Layers?
Each layer catches different failure modes:
| Layer | Catches |
|---|---|
| ServiceNow | Users who don't think about tags |
| IaC Templates | Developers who forget to include tag handling |
| Pipeline | Configuration errors in tag values |
| Azure Policy | Everything that bypasses automation |
If you only have Azure Policy, you find out about missing tags when deployment fails, after someone has waited through the whole pipeline. Earlier enforcement gives faster feedback.
If you only have IaC requirements, manual deployments bypass them entirely.
Defense in depth.
Dropdowns beat freetext
Freetext tags decay. If Environment allows any value, you'll end up with:
- Production
- production
- Prod
- prod
- PROD
- prd
- Live
Good luck writing a report that aggregates those.
For tags with a finite set of valid values, enforce the vocabulary:
| Tag | Allowed Values |
|---|---|
| Environment | Production, Staging, Development, Test, Sandbox |
| Department | (lookup from HR system or static list) |
Terraform validation example:
variable "environment" {
type = string
validation {
condition = contains(["Production", "Staging", "Development", "Test", "Sandbox"], var.environment)
error_message = "Environment must be one of: Production, Staging, Development, Test, Sandbox"
}
}Let tags flow down the hierarchy
Not every resource needs to be tagged individually if inheritance is set up correctly.
Resource Group inheritance: In Azure, you can configure policies to inherit tags from the resource group to child resources. Tag the resource group correctly, and VMs, disks, and NICs inside it inherit automatically.
Subscription-level tags: For organization-wide tags that rarely change (like company name or regulatory scope), apply at the subscription level.
Management group tags: For tags that apply to entire environments or business units.
Management Group (e.g., "Production")
└── Subscription (e.g., "Research-Prod")
└── Resource Group (e.g., "PatientPortal-RG")
└── Resources (VMs, storage, etc.)Tags can flow down, reducing the burden on individual resource deployments.
Requirements change. Plan for it.
Requirements change. Departments rename. Cost centers merge. You need a strategy for updating existing tags.
For controlled vocabulary changes:
- Add the new value to the allowed list
- Migrate resources to the new value
- Remove the old value from the allowed list
For bulk tag updates:
Azure provides tag operations at scale:
# Update all resources in a resource group
az tag update --resource-id /subscriptions/.../resourceGroups/my-rg \
--operation merge --tags Department=NewDepartmentNameOr use Azure Resource Graph to find resources with specific tag values and update them programmatically.
For ongoing hygiene:
Schedule a monthly job that:
- Queries all resources via Resource Graph
- Identifies resources with missing or non-compliant tags
- Generates a report for remediation
- (Optionally) auto-remediates where the correct value can be inferred
Tags that don't show in reports don't survive
Tags are only valuable if they flow into reporting. In Azure:
Cost Management filters: Native support for filtering and grouping by tag. Build dashboards that show cost by Department, Project, or CostCenter.
Exports: Schedule exports of cost data to storage accounts. Include tags in the export schema for downstream analysis in Power BI or custom tools.
Budgets: Create budgets scoped to specific tag values. Alert when the Project:DataPlatform spend exceeds $10,000/month.
Chargeback reports: Generate monthly reports showing each cost center's consumption. Automate delivery to finance.
If tags aren't in your cost reports, no one will care about maintaining them. Make tags visible in the outputs that matter.
The bottom line
A tagging strategy that works has three characteristics:
- Minimal mandatory tags: Six is usually enough. More creates friction without value.
- Enforcement at every layer: Request time, IaC templates, pipeline validation, Azure Policy. Defense in depth.
- Controlled vocabularies: Freetext decays. Dropdowns and validation keep values consistent.
The goal isn't perfect tagging. It's tagging good enough that cost allocation works, ownership is clear, and operations can automate based on tag values.
Start with the six core tags. Enforce them at all four layers. Add more only when you have a demonstrated need.
Tags don't maintain themselves. Systems that require them do.