Infrastructure Change Policy

Infrastructure changes are graph transitions, not just code changes. A passing typecheck can still describe replacement, deletion, new trust boundaries, new costs, or data loss.

This page records the current shared-environment policy for Pulumi-managed infrastructure and cloud automation.

Ownership Boundary

Pulumi owns the shared cloud resource graph:

Resource names, tags, IAM roles, policies, hosted zones, certificates, buckets, distributions, runtime host, control plane, deploy documents, and stable outputs.
Provider-specific implementation code under infra/pulumi/src/providers/aws.
Provider-neutral output contracts where those concepts are real, such as registry, ingress, runtime host, media storage, runtime configuration, deployment target, and control plane facts.

The deployment workflow owns app release orchestration:

SHA-tagged image identity.
Backend/frontend image build and push.
Runtime config validation.
SSM deploy invocation.
Reset/seed controls.
Smoke checks.
Runtime release receipts and app-runtime rollback.
Telemetry summaries.

Do not let routine app CD quietly become infrastructure mutation.

Resource Lifecycle Classes

Use lifecycle classes when reviewing Pulumi previews, topology projections, runbooks, and deployment summaries.

Class	Meaning	Current Examples	CD Posture
`ephemeral`	Can be recreated without preserving identity or data.	Lambda code packages, generated cold-start object, short-retention logs.	May be updated automatically after the workflow and preview policy are trusted.
`replaceable`	May be replaced, but dependent outputs or runtime facts must be refreshed.	Disposable EC2 runtime host posture, dynamic origin targets.	Requires visible preview summary and follow-up readiness checks.
`persistent`	Holds data, identity, policy, budget, DNS, or history that should not be casually lost.	Media bucket, hosted zones, budgets, runtime root volume, IAM trust boundaries.	Requires explicit human approval for replacement/delete and may need retain/protect/runbook work.
`externally durable`	Durability is provided by another managed service or external system, but topology changes still need care.	Future managed database or separately mounted data volume.	Treat changes as migrations with validation, not routine updates.

When in doubt, classify the resource conservatively and ask what would be lost if the graph transition applied exactly as previewed.

Preview And Update Policy

Routine app/API deploys should not run pulumi up.

Guarded infrastructure changes should:

Run Pulumi preview first.
Summarize creates, updates, replacements, deletes, protected resources, and IAM/trust-boundary changes.
Call out persistent or data-bearing resources explicitly.
Include the expected operator action for replacements, deletes, DNS delegation, SSM/runtime impact, and rollback.
Apply only after explicit approval.

Special handling is required for:

Stateful replacement or deletion.
Database or storage topology changes.
DNS delegation and certificate changes.
IAM broadening or OIDC trust changes.
Runtime host replacement.
Media bucket replacement or destroy.
Anything that can orphan, hide, or delete data.

Pulumi previews should be reviewed as graph-transition plans. They are not just “does the infra package compile?” evidence.

When a graph transition adds automation authority, also use Cloud Automation Permission Review to name the actor, cloud API actions, resource scope, trust boundary, evidence, and explicit non-goals before the automation is treated as ready to run.

Ingress Abuse Protection Posture

The deployed dev app currently relies on CloudFront as the public ingress boundary, private S3 origins through Origin Access Control, and EC2 app ingress restricted to the AWS-managed CloudFront origin-facing prefix list.

AWS WAF/Web ACLs are intentionally deferred for deployed dev while traffic is low and cost minimization remains the priority. Revisit this posture when request volume, anomalous wake traffic, spam, or abuse evidence justifies the recurring cost and operational surface.

For public-facing deployments such as production, this protection should be treated as a stronger baseline candidate rather than a last-resort add-on. A production review should consider an ingress Web ACL on CloudFront with rate-based rules and managed rule groups sized to normal traffic before the surface is opened broadly.

Pulumi State And Deployment Contract Boundary

Pulumi state is infrastructure-control-plane state. It records the full resource graph, provider identifiers, config encryption metadata, backend locking facts, and other implementation details needed by infra operators or approved infra automation. Routine app/docs deployment workflows should not treat direct Pulumi state access as a deploy input.

Deployment contracts are the sanitized deploy-facing artifacts derived from approved stack outputs. They should contain only the fields app/docs deployment needs: target, region, public URLs, registry references, runtime host paths, runtime configuration references, media/docs delivery facts, explicit deploy executor handles, and output-contract metadata. They should not include raw Pulumi exports, arbitrary provider physical IDs beyond explicit deploy handles, secret values, or state backend implementation details. The initial provider-neutral contract is wavemap.deployment-contract schema version 1.

This boundary lets Wavemap move Pulumi itself to a self-managed backend without forcing routine CD to understand that backend. Pulumi backend profiles may need backend-specific login, lock, secret-provider, backup, and recovery checks; deployment contract reads/writes should instead be shaped around a provider-neutral artifact-store interface. The first contract store interface is TDeploymentContractStoreAdapter; it supports local-file no-cloud workflows and AWS S3 as the first private cloud-backed store. Future Azure Blob, GCS, or other private artifact stores should be added only as real adapters when needed.

For aws-dev, the first self-managed Pulumi backend target is a separate protected S3 backend at s3://wavemap-dev-pulumi-state-959516292206-us-east-2/pulumi-state, with Pulumi stack secrets moved to the passphrase secrets provider. That bucket is modeled separately from the deployment contract store, uses private bucket posture and versioning, and remains an infra-operator surface only. The live migration still requires explicit approval for the bucket apply, secret-provider change, stack export/import, backend login, backup handling, and post-import preview.

The AWS dev stack now provisions the first private deployment contract artifact store as an S3-backed aws-s3 store. Its current-contract objects live under deployment-contracts/<pulumiStack>/current.json, are stored in a non-public bucket with bucket-owner-enforced ownership, default SSE-S3 encryption, and S3 versioning, and are exported from Pulumi as deploymentContractStore so follow-up automation can configure the storage adapter without knowing raw provider state.

This artifact store is not the Pulumi state backend. It is the deploy-facing storage surface for sanitized contracts after an approved infra pass has produced stack outputs. The Pulumi-managed docs deploy role has read-only access to the deployment contract prefix, and routine docs deployment now reads that contract instead of direct Pulumi outputs. The app deploy contract projection now includes runtimeDeployment.executor for the app lane, currently shaped as an aws-ssm-send-command handle with document name and target instance ID. The app deploy contract-read grant is modeled as a separate Pulumi-managed policy attachment to the existing GitHub app deploy role, scoped to list/read the contract prefix only. The app/API CD workflow has the repo-side contract-read path and no longer requires a routine Pulumi credential, but live use still depends on applying that read grant and publishing a refreshed contract. The future contract-publishing writer still needs explicit review before it receives write grants.

Operator Store And Access Model

The durable deployed-dev operator model separates state, sanitized deploy facts, runtime secrets, and workflow bootstrap values. Reaching for Pulumi state, GitHub secrets, or raw provider outputs during routine CD is a signal that the deploy contract boundary needs to be refreshed.

Surface	Current Location	Writers	Readers	Should Not Own	Remaining Live Gate
Pulumi state backend	`s3://wavemap-dev-pulumi-state-959516292206-us-east-2/pulumi-state` after migration. Pulumi Cloud remains the recovery source until the migration is proven.	Infra operators and future explicitly approved infra automation.	Infra operators and the manual infra-topology GitHub OIDC role through narrow backend read access.	Routine app/API CD, routine docs CD, deployment contract reads, runtime secrets, or public docs data.	Apply the protected backend bucket, migrate the stack, prove S3-backed preview/output capture, and backup.
Deployment contract store	`s3://wavemap-dev-deployment-contracts-20260518124223904700000001/deployment-contracts/<pulumiStack>/current.json`.	Approved contract publisher from reviewed Pulumi outputs.	Docs deploy role, app deploy role after the modeled read grant is applied, and infra operators.	Raw Pulumi exports, stack secrets, backend internals, or arbitrary provider output dumps.	Apply the app deploy read grant and publish a refreshed contract containing `runtimeDeployment.executor`.
Runtime parameter store	AWS SSM Parameter Store under `/wavemap/dev/runtime/*`.	Runtime config population command or approved operator workflow.	Runtime EC2 role reads values; cloud-plan jobs validate metadata and parameter references.	GitHub logs, workflow summaries, rendered env files, or deployment contracts carrying secret values.	Keep future secret additions in SSM and expose only names, kinds, or references through deploy contracts.
GitHub `dev` environment	GitHub environment variables and secrets for OIDC role ARNs, backend selectors, deployment contract bucket names, and manual workflow bootstrap values.	Repository operators through GitHub environment settings.	Specific workflows protected by the `dev` environment.	Cloud resource graph ownership, rendered runtime config, raw Pulumi exports, or long-term state.	Set S3 backend variables after migration proof; delete `PULUMI_ACCESS_TOKEN` only after the S3 path passes.
Private topology artifacts	Local private operator directories and short-retention GitHub workflow artifacts from infra-topology capture, projection, and review stages.	Manual infra-topology workflow or local operator commands.	Operators and reviewers.	Public docs content until sanitized, reviewed, and accepted through the figure ledger.	Prove the S3-backed infra-topology ingest lane after backend migration and before retiring Pulumi Cloud.
Public topology projections	Reviewed docs assets and Mermaid sidecars under `apps/wavemap-docs/src/content/docs/figures/topology` and `apps/wavemap-docs/public/operations/figures/topology`.	Human-reviewed docs changes.	Anyone reading the docs site.	Raw provider identifiers, workflow evidence, secrets, or private generated candidates.	Add automated reference checks only when the publication workflow needs them.

For deployed dev, this means routine app/API and docs CD should be able to operate from the deployment contract and their own narrow roles. Pulumi credentials, the Pulumi backend passphrase, raw exports, and backend object access remain infra-operator concerns, except for the manual infra-topology ingest role that reads the backend as private evidence.

Production Hardening Still Deferred

The deployed-dev model is intentionally cost-conscious and lightweight. A production environment should not inherit that posture without a separate review.

Production planning should revisit at least these hardening items:

Area	Production Baseline Candidate
Ingress abuse protection	CloudFront Web ACL with rate-based rules, managed rule groups, and alerting sized to expected public traffic.
Account and network isolation	Dedicated production AWS account, environment-specific trust boundaries, private networking review, and tighter egress.
Pulumi state durability	KMS-backed encryption review, restore drill, access review, encrypted stack-export backup policy, and lock recovery notes.
Deployment contract handling	Separate production artifact store, contract publishing writer review, and change evidence tied to release promotion.
Data durability	Managed database posture, backups, restore testing, monitoring, RTO/RPO ownership, and disaster-recovery expectations.
Release strategy	Staging or pre-production gate, cross-environment promotion, rollback proof, and blue/green or canary options.
Observability and audit	CloudTrail/log retention, alarms, budget guardrails, workflow evidence retention, and incident-ready dashboards.
Access governance	Scheduled review of GitHub environment protections, OIDC subjects, service roles, break-glass credentials, and rotation.
Media and CDN operations	Production-grade media lifecycle, cache invalidation policy, object retention posture, and CDN monitoring.

Preview Summary Contract

Before any workflow performs pulumi up, it must publish a preview summary that a human can review without reading raw Pulumi output first. The summary is a review artifact, not approval by itself.

The first summary format should include:

Section	Required Content
Target	Environment, Pulumi project, stack, cloud provider, selected ref or commit, workflow run, and actor.
Change totals	Counts for creates, updates, replacements, deletes, unchanged resources, and resources that could not be classified.
Review status	`no-op`, `review-required`, or `blocked`, with the reason for the status.
High-risk changes	Every replacement, delete, protected-resource change, data-bearing change, DNS/certificate change, IAM or OIDC trust change, and SSM document change.
Lifecycle classification	Resource lifecycle class when known: `ephemeral`, `replaceable`, `persistent`, `externally durable`, or `unclassified`.
Output contract changes	Added, removed, renamed, or meaningfully changed stack outputs that deployment, runtime config, smoke, docs hosting, or topology tooling consumes.
Required operator action	Approval needed, runbook to follow, post-apply validation, smoke lane, rollback path, or reason the preview should not be applied.
Evidence	Raw preview artifact location, summary schema version, timestamp, and any sanitization or redaction note.

Use conservative status rules:

no-op means the preview has no creates, updates, replacements, or deletes.
review-required means the preview is complete enough for human review but still needs explicit approval before apply.
blocked means the preview includes an unclassified replacement/delete, a persistent or data-bearing risk without an operator plan, an IAM/trust broadening without permission review, missing raw evidence, or any secret/plaintext leak.

The summary may include Pulumi URNs, resource types, and stable logical names needed for review. Keep raw exports, provider physical IDs, secret values, arbitrary provider inputs/outputs, and workflow logs out of public docs. If a raw artifact is needed for debugging, keep it as private workflow evidence with short retention.

Cloud Portability Posture

Wavemap is AWS-first, not AWS-only-by-design.

Current convention:

Keep provider-neutral vocabulary where the concept is shared: cloudProvider, deploymentEnvironment, runtimeHost, mediaStorage, containerRegistry, ingress, controlPlane, and runtimeConfiguration.
Keep provider-specific resources and execution semantics inside provider-owned boundaries.
Use AWS-specific names where the code is honestly AWS-specific, such as S3 buckets, ECR repositories, SSM documents, CloudFront distributions, Lambda functions, and Route53 records.
Do not add placeholder Azure/GCP directories or workflow paths until a second provider is real enough to clarify the shape.
Do not over-abstract shell wrappers that currently execute AWS CLI, Docker, SSM, ECR, S3, CloudFront, or Pulumi behavior.

If a second provider becomes real, prefer a parallel provider adapter selected by typed target/profile logic over one generic wrapper that hides provider differences.

Topology Evidence Boundary

Raw infrastructure evidence can include provider-specific and sensitive-adjacent details. Keep the publication boundary strict:

Raw Pulumi exports, DOT labels, workflow evidence, provider physical IDs, and private generated candidates stay private.
Sanitized inventory and normalized graph data are review inputs.
Public docs receive only reviewed, sanitized projections.
Human publication decisions live in infra/topology-processing-reviews/figure-slot-decisions.json.

The topology pipeline is documented in Infra Topology Processing.

Current Deferred Policy Work

Implement the preview-summary generator and workflow gate before any workflow performs pulumi up.
Add automated checks that approved topology publication paths exist and are referenced from their target docs pages.
Model runtime database and volume state as provider-neutral topology nodes.
Add scheduled drift checks only when infrastructure churn or operator pain justifies them.
Exercise the runtime-host and media-bucket replacement runbooks only when a real preview or planned drill justifies the disruption.