Runbooks
Runbooks are the reviewed operator path for repeated maintenance, recovery, and high-signal proof work. They should stay short enough to follow under pressure, but explicit about approval gates, mutation scope, and evidence.
The CLI command surface is documented in the Wavemap CLI Command Reference. This page explains when and how the deployed-dev operations are used. Use the Deploy Dev and Smoke command reference sections for command ownership, wrapper paths, compatibility scripts, flags, and default mutation posture.
Deployed-Dev Boundary
Section titled “Deployed-Dev Boundary”These runbooks apply only to the shared deployed dev environment:
| Boundary | Current Value |
|---|---|
| Public app URL | https://dev.wavemap.app |
| Pulumi stack | aws-dev |
| Runtime region | us-east-2 |
| Runtime target | Sleepable EC2 host running Docker Compose |
| Runtime config store | AWS SSM Parameter Store under /wavemap/dev/runtime/* |
| Last-good receipt | /opt/wavemap/deployment/last-successful-runtime-release.json |
Durable examples use the public CLI form:
pnpm wavemap -- <command> [flags...]Root scripts such as pnpm deploy:dev:runtime and pnpm smoke:dev remain compatibility aliases. Use them when a local
note or workflow already does, but prefer public wavemap routes in new runbooks.
Examples that need live Pulumi stack outputs use this placeholder path:
/tmp/wavemap-aws-dev-outputs.jsonCapture or supply that file through the approved deploy workflow, cloud-plan job, or local operator process before running commands that need live cloud facts. Do not paste secret values into the file or into runbook notes.
Shared Operator Gates
Section titled “Shared Operator Gates”Before running a live command, confirm:
- The target is deployed
dev, not local Docker, staging, or production. - The selected checkout/ref is the one intended for deployment or verification.
- Any live command has been dry-run first when the command supports a dry-run posture.
- The command’s mutation gate is explicit, usually
--execute. - Secrets are not printed, copied into workflow summaries, or stored in GitHub when the runtime host should read them from SSM.
- The expected evidence is known before starting: SSM command ID, GitHub job summary, smoke result, discrepancy counts, or failure artifact.
Operator Store Quick Reference
Section titled “Operator Store Quick Reference”Use this table before deciding which credential, artifact, or workflow path to reach for. The durable store and access boundary is explained in Infrastructure Change Policy.
| Task | Normal Input | Actor | Gate |
|---|---|---|---|
| Routine app/API deploy | Current deployment contract plus app deploy role. | .github/workflows/deploy-dev.yml. | App deploy contract-read grant and refreshed contract must be live. No routine Pulumi token. |
| Routine docs deploy | Current deployment contract plus docs deploy role. | .github/workflows/deploy-docs.yml. | Docs role reads the private contract store. No routine Pulumi token. |
| Infrastructure mutation | Pulumi backend, Pulumi config, and infra operator credentials. | Local infra operator, or future explicitly approved infra workflow. | Preview summary, human approval, and runbook-specific post-apply evidence. |
| Manual infra-topology ingest | S3 Pulumi backend URL, backend region, infra-topology OIDC role, passphrase. | .github/workflows/infra-topology-ingest.yml or local operator capture. | Self-managed backend migration and narrow backend-read role must be proven before retiring Pulumi Cloud. |
| Runtime secret value change | SSM Parameter Store path under /wavemap/dev/runtime/*. | Runtime config population command or approved operator process. | GitHub should carry references or bootstrap values, not decrypted runtime secrets. |
| Deployment contract publication | Reviewed Pulumi outputs projected into wavemap.deployment-contract v1. | Approved contract publisher. | Future writer permission still needs least-privilege review before receiving artifact-store write access. |
| Public topology figure publication | Reviewed sanitized topology projection and figure ledger decision. | Human docs edit after private capture/projection review. | Raw captures and private generated candidates stay outside public docs. |
Pulumi State Backend Migration
Section titled “Pulumi State Backend Migration”Use this runbook to move the aws-dev Pulumi stack from Pulumi Cloud to the self-managed private S3 backend. This is an
infra-operator operation, not routine app/API or docs CD.
Target backend:
s3://wavemap-dev-pulumi-state-959516292206-us-east-2/pulumi-stateBefore approval:
- Confirm no
pulumi up, docs/app deploy, runtime replacement, media bucket replacement, or topology capture is in progress. - Confirm the checked-out repo includes the modeled
provider.aws.pulumiStateBackendbucket resource. - Confirm the infra operator AWS identity can create and later read/write the state bucket and objects under
pulumi-state/.pulumi/*. - Create or retrieve the
aws-devPulumi passphrase in the operator secret store. - Decide where the plaintext migration export will live temporarily and where an encrypted backup will be stored after import.
Apply the backend bucket only after explicit approval:
pnpm -C infra/pulumi run previewpnpm -C infra/pulumi run up -- --yesPoint the current shell at the passphrase file:
export PULUMI_CONFIG_PASSPHRASE_FILE="$HOME/.config/wavemap/pulumi/aws-dev.passphrase"Move stack secrets to the passphrase provider while still on the old backend:
pnpm -C infra/pulumi exec pulumi stack change-secrets-provider passphrase --stack aws-devReview the Pulumi.aws-dev.yaml diff before continuing.
Create a strict-permission migration export:
umask 077export WAVEMAP_PULUMI_MIGRATION_BACKUP_DIR="/private/tmp/wavemap-pulumi-state-migration-$(date -u +%Y%m%dT%H%M%SZ)"mkdir -p "$WAVEMAP_PULUMI_MIGRATION_BACKUP_DIR"pnpm -C infra/pulumi exec pulumi stack export \ --stack aws-dev \ --show-secrets \ --file "$WAVEMAP_PULUMI_MIGRATION_BACKUP_DIR/aws-dev.stack.json"Treat this export as secret-bearing plaintext. Do not upload it to GitHub artifacts, docs, chat, or shared storage.
Import into the self-managed backend:
export WAVEMAP_PULUMI_STATE_BACKEND_URL="s3://wavemap-dev-pulumi-state-959516292206-us-east-2/pulumi-state"pnpm -C infra/pulumi exec pulumi logoutpnpm -C infra/pulumi exec pulumi login "$WAVEMAP_PULUMI_STATE_BACKEND_URL"pnpm -C infra/pulumi exec pulumi stack init aws-dev --secrets-provider passphrasepnpm -C infra/pulumi exec pulumi stack import \ --stack aws-dev \ --file "$WAVEMAP_PULUMI_MIGRATION_BACKUP_DIR/aws-dev.stack.json"Verify before retiring Pulumi Cloud assumptions:
pnpm -C infra/pulumi exec pulumi whoami --verbosepnpm -C infra/pulumi exec pulumi preview --stack aws-devpnpm -C infra/pulumi exec pulumi stack output --stack aws-dev --json > /tmp/wavemap-aws-dev-outputs.jsonaws s3 ls s3://wavemap-dev-pulumi-state-959516292206-us-east-2/pulumi-state/.pulumi/stacks/Expected evidence:
- Preview/apply evidence for the protected state backend bucket.
- A
pulumi whoami --verboseresult showing the S3 backend URL. - A no-surprise
pulumi preview --stack aws-devfrom the S3 backend. - A fresh stack output capture from the S3 backend.
- A reviewed plan for preserving an encrypted stack backup and deleting the local plaintext migration export.
Recovery posture:
- Before S3 import succeeds, the old Pulumi Cloud stack remains the recovery source.
- After the passphrase provider change, the passphrase is required even when operating temporarily from the old backend.
- After import succeeds, S3 object versioning and Pulumi history help recover from backend object drift, but they do not replace a deliberate encrypted stack-export backup.
After the migrated backend is proven, update the GitHub dev environment for manual infra-topology ingest:
- Variable:
PULUMI_BACKEND_URL=s3://wavemap-dev-pulumi-state-959516292206-us-east-2/pulumi-state. - Variable:
PULUMI_STATE_BACKEND_AWS_REGION=us-east-2. - Secret:
INFRA_TOPOLOGY_AWS_ROLE_TO_ASSUME, scoped to the infra-topology state backend role. - Secret:
PULUMI_CONFIG_PASSPHRASE, matching theaws-devpassphrase provider.
Then remove PULUMI_ACCESS_TOKEN from the GitHub dev environment and revoke the Pulumi Cloud token if it is no longer
needed outside Wavemap. This is a live credential cleanup step and should be done deliberately, after the S3 backend
preview and topology ingest path have both passed.
GitHub Workflow Dispatch Recipe Selection
Section titled “GitHub Workflow Dispatch Recipe Selection”Use this runbook when manually dispatching .github/workflows/deploy-dev.yml for deployed dev.
The dispatch posture is recipe-first. Start with a named run_profile, then add explicit add-ons only when the proof you
need is outside that profile. Treat custom and raw stage toggles as deliberate exceptions, not the normal operator
interface.
Before dispatch:
- Confirm the selected
git_refis the branch or SHA intended for this proof. - Confirm whether the run should be non-mutating, app-runtime mutating, data-destructive, media-mutating, or lifecycle-disruptive.
- Confirm a heavier profile is worth its cost, runtime disruption, and artifact noise.
- Decide what evidence will make the run useful before starting it.
Profile selection:
| Profile | Use When | Boundary |
|---|---|---|
preflight | Checking repo-local deploy contracts through GitHub Actions. | No cloud authentication and no live mutation. |
cloud-plan | Checking the deployment contract, GitHub OIDC, live SSM metadata, and deploy dry-runs. | Cloud-authenticated but non-mutating. |
deploy-endpoint | Normal deployed-dev app/API deploy proof. | Builds/pushes images, deploys the app runtime, gates endpoint smoke on migration status. |
deploy-seeded-browser | Proving the seeded route and browser basics after app or data-shape changes. | Includes destructive database reset before seeded and browser smoke. |
deploy-media | Proving API, S3, public media URL, CloudFront media delivery, browser rendering, and drift counts. | Includes destructive reset, temporary media mutation, browser media proof, and DB/S3 report. |
deploy-lifecycle | Proving shutdown, cold-start page, wake, and browser reload behavior. | Includes destructive reset and deliberately stops the runtime host for recovery proof. |
custom | One-off debugging or an intentionally unusual stage combination. | Operator owns the full prerequisite and evidence story. |
Automatic develop deploys use the conservative deploy-endpoint profile and do not read manual dispatch inputs. That
profile now includes the migration status gate and migration-only repair before endpoint smoke; destructive reset and
seeded data remain off.
Optional add-ons:
| Add-On | Pair With | Use When |
|---|---|---|
run_docker_preflight=true | Any profile | You want slower Docker build verification during preflight. |
validate_cloud_env_contract=true | Any profile | You want GitHub-owned bootstrap values checked before a cloud job needs them. |
run_db_migrate=false | custom deploy runs | You want the migration status gate to fail instead of applying pending migrations. |
run_wake_smoke=true | deploy-endpoint or any deploy profile | You want the cheap wake-path endpoint proof after ordinary endpoint smoke. |
run_browser_routing_smoke=true | deploy-endpoint or any deploy profile | You want non-destructive Chromium login/routing proof before any database reset. |
run_cold_start_browser_smoke=true | deploy-media or custom with seeded prerequisites | You want lifecycle proof in addition to media proof, or a one-off stopped-host recovery check. |
run_media_smoke=true and run_browser_media_smoke=true | deploy-lifecycle or custom with seeded prerequisites | You want media proof in addition to lifecycle proof. |
Keep deploy-media and deploy-lifecycle separate for normal use. A composite proof is allowed through explicit
add-ons, but it should stay deliberate because media proof mutates app media and lifecycle proof deliberately stops the
runtime host.
Expected evidence:
- Workflow run number or URL.
- Selected ref, resolved commit SHA, selected profile, and selected add-ons.
- Resolved stage summary.
- SSM command IDs for runtime deploy, database status/migrate, reset, release-record, rollback, or discrepancy-report stages when they run.
- Smoke results, elapsed times, and artifact names for any browser, media, or lifecycle failures.
- A short note explaining operator intent when
customis used.
Use local commands for focused manual repair, dry-run planning, or when a workflow job’s summary points at a specific operator action.
Runtime And Data Lifecycle Quick Reference
Section titled “Runtime And Data Lifecycle Quick Reference”The deployed-dev runtime is cost-first and disposable, but different operations have different data consequences. Use Deployed Dev Lifecycle for the full lifecycle matrix, teardown gradations, and expected evidence. Use Data Durability And Recovery for the current disposable-data posture, backup learning-drill boundary, and future recovery gates. Use Media Workflow And Validation when choosing between media smoke, browser media smoke, and discrepancy reporting.
| Operation | Runtime Behavior | Database Behavior | Media Behavior | Operator Gate |
|---|---|---|---|---|
| Runtime deploy | Pulls selected backend/frontend images and restarts Compose on the current host. | Preserved unless the deploy also runs an explicit reset or migration path. | Unchanged. | runtime deploy --execute after images and runtime config are ready. |
| Host stop or automatic inactivity shutdown | Stops the EC2 instance; root EBS remains attached. | Preserved across stop/start. | Unchanged. | Shutdown Lambda or approved lifecycle proof. |
| Runtime rollback | Redeploys the last-good app image pair through the runtime deploy document. | Not rolled back. | Not rolled back. | runtime rollback --execute, followed by endpoint smoke. |
| Database status | Runs a read-only migration ledger check in the deployed API container. | Read-only. | Unchanged. | database status, before migration repair or deploy schema gates. |
| Database migrate | Runs pending Drizzle migrations in the deployed API container. | Schema mutation only; existing rows should be preserved. | Unchanged. | database migrate --execute, followed by status and endpoint smoke. |
| Database reset | Runs migrate, base seed, and deterministic dev-data seed in the deployed API container. | Destructive; non-seed rows can be deleted. | Unchanged; objects can become application-orphaned. | database reset --execute, followed by seeded smoke. |
| Media discrepancy report | Runs a read-only DB/S3 comparison in the deployed API container. | Read-only. | Read-only. | media discrepancy-report --execute for live SSM execution. |
| EC2/runtime replacement | Replaces or destroys the host through infrastructure change. | At risk unless a separate backup, snapshot, or migration runbook is used. | Unchanged unless media infrastructure also changes. | Runtime host replacement after approval. |
| Media bucket replacement or destroy | Not a runtime-host action. | Rows may reference missing media after replacement/delete. | At risk. | Media bucket replacement after approval. |
Host stop is cost control. It is not database cleanup, media cleanup, backup, rollback, or deploy-state mutation.
Host Stop And Wake Recovery
Section titled “Host Stop And Wake Recovery”Use this runbook when the deployed-dev runtime host is stopped, may be stopped, or needs a deliberate stopped-host recovery proof.
Host stop is a shared-environment disruption. Prefer the deploy-lifecycle workflow profile when the goal is a normal
stopped-host proof, because the workflow captures shutdown, cold-start, browser, and smoke evidence in one place. Use the
manual checks below for focused recovery, control-plane debugging, or when an automatic inactivity stop has already
occurred.
Before deliberate host stop:
- Confirm the target is
https://dev.wavemap.app, Pulumi stackaws-dev, and runtime regionus-east-2. - Confirm no deploy, reset, media proof, or demo is in progress.
- Confirm whether endpoint wake recovery is enough, or whether the browser must return to its original destination.
- Confirm the seeded baseline is valid before choosing browser cold-start recovery.
- Confirm the stop is only cost-control or lifecycle proof. Do not combine it with database reset, media cleanup, rollback, backup, replacement, or stack teardown without a separate operator decision.
Deliberate host stop is owned by the shutdown Lambda. It has no public app route. This is a live cloud mutation and
needs explicit operator approval at run time. If a local operator invokes it outside the deploy-lifecycle workflow,
derive the shutdown function name from the approved deployment contract’s resource prefix or another reviewed infra
output, then use an approved AWS operator identity:
aws lambda invoke \ --region us-east-2 \ --function-name "<shutdownFunctionName>" \ /tmp/wavemap-dev-shutdown-response.jsonAfter stopping the host, confirm the app URL serves the cold-start page rather than a raw CloudFront, connection, or app error. The cold-start page should appear at the original app URL while the runtime host is stopped or warming up.
For endpoint wake recovery, run:
pnpm wavemap -- smoke dev --wakeThis calls the same-origin /__wake path, waits for frontend readiness, and then replays endpoint smoke. Use it when the
host may simply be asleep, when validating the wake Lambda and dynamic origin refresh, or when the seeded browser
baseline is not known to be valid.
For browser destination recovery from an intentionally stopped host, run:
pnpm wavemap -- smoke dev cold-start-browserThis starts at the seeded Sorsari artist route, expects the cold-start page at that original URL, lets the page call
/__wake, waits for readiness, verifies the browser reloads into the original destination, and then replays endpoint and
seeded checks.
If the host was stopped automatically by the inactivity monitor, treat recovery the same way:
- Use
pnpm wavemap -- smoke dev --wakefor cheap endpoint recovery. - Use
pnpm wavemap -- smoke dev cold-start-browseronly when the seeded baseline is still valid and browser recovery is the evidence you need. - Do not add a database reset only to make browser recovery easier; reset remains a separate destructive decision.
If runtime deploy starts while the host is stopped, the runtime deploy wrapper should wake the host and retry SSM while the instance comes back online. Use the Runtime Deploy runbook and capture resume telemetry from the deploy output.
Expected evidence:
- Shutdown Lambda response or workflow stage summary when a deliberate stop was performed.
- Confirmation that the stopped app URL served the cold-start page.
- Wake smoke result or cold-start browser smoke result.
- Resume telemetry if runtime deploy performed the wake.
- Playwright artifacts, shutdown response, cold-start precheck HTML, or workflow summary links when recovery fails.
Failure triage:
| Symptom | First Place To Look |
|---|---|
| Cold-start page does not appear for the stopped app URL. | CloudFront custom error behavior, app-origin connection timeout settings, and static cold-start origin. |
| Wake call succeeds but readiness times out. | Wake Lambda logs, EC2 instance state, dynamic app-origin.dev.wavemap.app DNS update, and app startup logs. |
| Runtime deploy reports early SSM target errors. | Resume telemetry; retryable SSM registration delay is expected while a stopped host comes back. |
| Endpoint wake passes but browser recovery fails. | Cold-start page client reload behavior, original destination routing, and Playwright artifacts. |
Runtime Host Replacement
Section titled “Runtime Host Replacement”Use this runbook when a Pulumi preview proposes replacing or destroying the deployed-dev EC2 runtime host, or when an operator intentionally chooses to recreate the runtime host for AMI, bootstrap, host-size, VPC, subnet, security-group, or instance-profile work.
Runtime host replacement is not host stop, runtime deploy, runtime rollback, or database reset. It is an infrastructure change that can discard the containerized Postgres data path and the host-side last-good runtime release receipt.
Before approval:
- Confirm the target is deployed
dev, Pulumi stackaws-dev, and regionus-east-2. - Confirm no deploy, reset, media proof, or demo is in progress.
- Run Pulumi preview and identify every create, replacement, delete, IAM change, DNS change, SSM document change, and runtime output change.
- Confirm whether the preview changes only runtime-host resources or also touches media storage, CloudFront, DNS, IAM, or SSM runtime configuration.
- Decide the database outcome before applying: accept data loss and reseed, preserve through a separate backup/restore drill, or abort the replacement.
- Decide whether the host-side last-good release receipt matters. Replacement can remove it, so plan to record a fresh receipt after a known-good deploy.
- Confirm runtime app secrets and config remain owned by SSM Parameter Store and do not need to be copied from the old host.
Preview the infrastructure change:
pnpm -C infra/pulumi run previewApply only after explicit approval for the replacement and data outcome:
pnpm -C infra/pulumi run upAfter apply, capture fresh stack outputs:
pnpm -C infra/pulumi exec pulumi stack output --stack aws-dev --json > /tmp/wavemap-aws-dev-outputs.jsonValidate the captured target and runtime contracts:
pnpm wavemap -- deploy dev cloud-target --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonpnpm wavemap -- deploy dev runtime-config live --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonpnpm wavemap -- deploy dev runtime-env --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonDeploy an intended image pair to the new host:
pnpm wavemap -- deploy dev bundle --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonpnpm wavemap -- deploy dev runtime deploy --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.json --execute --github-outputpnpm wavemap -- smoke devIf the chosen database outcome was “disposable reset”, reseed and prove the seeded baseline:
pnpm wavemap -- deploy dev database reset --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.json --execute --github-outputpnpm wavemap -- smoke dev seededAfter endpoint smoke passes, record a fresh last-good receipt:
pnpm wavemap -- deploy dev runtime record-release --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.json --execute --github-outputExpected evidence:
- Pulumi preview summary naming the runtime-host replacement and any dependent resource changes.
- Explicit approval note covering the database outcome.
- Refreshed deployment contract after apply.
- Cloud-target, runtime-config, and runtime-env readiness results.
- Runtime deploy SSM command ID against the new instance.
- Endpoint smoke result, and seeded smoke result if reset was selected.
- Fresh last-good release receipt after the new host is proven.
Failure triage:
| Symptom | First Place To Look |
|---|---|
| Runtime deploy cannot target the instance. | Refreshed deployment contract, EC2 instance state, SSM managed-instance registration, and instance profile. |
| SSM command runs but Docker or Compose setup fails. | EC2 bootstrap user data, Docker service status, Compose plugin install, ECR login, and /opt/wavemap paths. |
| App URL does not reach the new host. | Dynamic app-origin.dev.wavemap.app record, CloudFront app origin settings, security group ingress, and port 80. |
| App starts but seeded routes are missing. | Database outcome decision, Postgres data path, migration/reset output, and seeded smoke artifacts. |
| Rollback cannot find a receipt. | Expected after host replacement unless a fresh receipt has been recorded on the new host. |
Media Bucket Replacement Or Destroy
Section titled “Media Bucket Replacement Or Destroy”Use this runbook when a Pulumi preview proposes replacing, destroying, or recreating the deployed-dev media S3 bucket or
its delivery path. This includes changes that alter the bucket physical name, CloudFront media origin, bucket policy,
origin access control, runtime media outputs, or runtime MEDIA_S3_BUCKET_NAME value.
The deployed-dev media bucket is intentionally disposable and currently uses forced cleanup at the infrastructure level. That makes replacement possible, not routine. Bucket replacement can delete objects and can leave database rows pointing at media that no longer exists.
Before approval:
- Confirm the target is deployed
dev, Pulumi stackaws-dev, and regionus-east-2. - Confirm no media smoke, upload test, database reset, or demo is in progress.
- Run Pulumi preview and identify every media bucket, bucket policy, CloudFront, IAM, runtime config, and output change.
- Decide the object-data outcome before applying: accept deletion, preserve through a separate object-copy plan, or abort.
- Decide the database outcome before applying: keep rows as-is, run destructive reset after replacement, or plan a separate row migration for locator fields.
- Remember that S3 media rows store
storageLocation/thumbnailStorageLocation; copying objects to a new bucket may still require a DB locator migration if old rows should remain deletable and reconcilable. - Confirm GitHub Actions will not decrypt runtime media secrets; the runtime host reads media config through SSM-rendered env files.
If current DB/S3 drift matters, capture a read-only baseline before replacement:
pnpm wavemap -- deploy dev media discrepancy-report --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.json --execute --github-outputPreview the infrastructure change:
pnpm -C infra/pulumi run previewApply only after explicit approval for object deletion/preservation and database handling:
pnpm -C infra/pulumi run upAfter apply, capture fresh stack outputs:
pnpm -C infra/pulumi exec pulumi stack output --stack aws-dev --json > /tmp/wavemap-aws-dev-outputs.jsonRefresh runtime media configuration and re-render the host env through runtime deploy:
pnpm wavemap -- deploy dev cloud-target --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonpnpm wavemap -- deploy dev runtime-config plan --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonpnpm wavemap -- deploy dev runtime-config populate --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.json --executepnpm wavemap -- deploy dev runtime-config live --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonpnpm wavemap -- deploy dev runtime deploy --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.json --execute --github-outputThen prove app and media behavior:
pnpm wavemap -- smoke devpnpm wavemap -- smoke dev media --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.json --executepnpm wavemap -- smoke dev browser-mediapnpm wavemap -- deploy dev media discrepancy-report --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.json --execute --github-outputIf the chosen database outcome was “disposable reset”, run the reset before media/browser-media proof:
pnpm wavemap -- deploy dev database reset --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.json --execute --github-outputpnpm wavemap -- smoke dev seededExpected evidence:
- Pulumi preview summary naming the bucket replacement/delete and dependent CloudFront/IAM/runtime-config changes.
- Explicit approval note covering object handling and database handling.
- Optional pre-replacement discrepancy report when preserving or interpreting existing media matters.
- Fresh Pulumi outputs capture after apply.
- Runtime-config plan/populate/live evidence showing the new media bucket value is ready without printing secrets.
- Runtime deploy SSM command ID proving the host env was re-rendered.
- Endpoint smoke, media smoke, browser media smoke, and post-replacement discrepancy report.
Failure triage:
| Symptom | First Place To Look |
|---|---|
| Upload fails after replacement. | Runtime MEDIA_S3_BUCKET_NAME, runtime host role media policy, bucket existence, and bucket public-access block. |
| Upload succeeds but public media 403s. | CloudFront media origin, bucket policy for origin access control, path-prefix strip function, and object key. |
| Existing media rows render missing images. | Object handling decision, DB locator fields, copied object keys, and discrepancy report output. |
| Delete or cleanup targets the old bucket. | Stored storageLocation / thumbnailStorageLocation values and any planned locator migration. |
| Browser-media smoke fails but API smoke passes. | CloudFront /media/* routing, image URL returned by the API, browser test artifacts, and cache behavior. |
Runtime Config Readiness
Section titled “Runtime Config Readiness”Use this runbook when a deploy is blocked by missing runtime configuration, a runtime parameter changed, or an operator needs to verify SSM parameter readiness before runtime deploy.
This is the operational procedure for runtime config readiness. Source ownership lives in Configuration And Secrets, and the deployed-dev environment shape lives in Deployed Dev Environment. There is no separate runtime-config operations page until the operator surface grows beyond this runbook.
Runtime config source ownership:
- Pulumi outputs expose parameter names and value kinds, not plaintext secrets.
- SSM Parameter Store owns deployed app runtime config and runtime secrets.
- GitHub Actions should not decrypt or log
SecureStringruntime values. - Frontend
NEXT_PUBLIC_*values are browser build inputs, not runtime SSM parameters.
For the broader source-ownership and secret-handling convention, see Configuration And Secrets.
Plan the required runtime configuration without cloud calls:
pnpm wavemap -- deploy dev runtime-config planpnpm wavemap -- deploy dev runtime-config plan --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonPlan SSM population for non-secret String parameters:
pnpm wavemap -- deploy dev runtime-config populate --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonWrite only the selected non-secret String parameters after the operator accepts the plan:
pnpm wavemap -- deploy dev runtime-config populate --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.json --executeCheck live SSM metadata without reading parameter values:
pnpm wavemap -- deploy dev runtime-config live --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonVerify the host env rendering contract without contacting the runtime host:
pnpm wavemap -- deploy dev runtime-env --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonStop if readiness is blocked. Fix the missing name/type/source problem first, then rerun the readiness check. Secret
creation or rotation is a separate operator action; do not use runtime-config populate to create SecureString
parameters.
Expected evidence:
- Runtime config plan groups, required/optional counts, and redacted secret placeholders.
- Live metadata readiness showing required parameters present with expected
String/SecureStringtypes. - No decrypted secret values in logs, local files, workflow summaries, or artifacts.
Runtime Deploy
Section titled “Runtime Deploy”Use this runbook to deploy an already selected app-runtime image pair to deployed dev. Routine branch CD uses the
deploy-endpoint profile; local runtime deploy is mainly for focused repair, replaying a deploy after images already
exist, or proving the runtime handoff directly.
Preconditions:
- Runtime config readiness is green.
- Backend and frontend images for the selected
sha-<full-git-sha>tag exist in ECR, or the image build/push step will run before runtime deploy. - The deployment facts file points at the intended account, stack, region, runtime instance, and SSM document. In routine CD, this is the sanitized deployment contract read from the private artifact store.
- Database reset, media proof, and lifecycle proof are selected only if they are part of the intended recipe.
Plan the runtime bundle:
pnpm wavemap -- deploy dev bundle --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonDry-run the runtime deploy command:
pnpm wavemap -- deploy dev runtime deploy --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonExecute the runtime deploy:
pnpm wavemap -- deploy dev runtime deploy --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.json --execute --github-outputThen run endpoint smoke:
pnpm wavemap -- smoke devThe live runtime deploy sends the Pulumi-modeled SSM document to the runtime host. If the host is stopped, the wrapper uses the same-origin wake path and retries SSM while the instance comes back online. The deploy document renders env files from SSM Parameter Store, logs into ECR on-host, pulls the selected images, and starts Docker Compose detached.
Expected evidence:
- Runtime deploy SSM command ID.
- Resume telemetry when the host needed to wake.
- Successful command completion or a failure log that identifies the SSM send, wait, ECR pull, env render, or Compose stage.
- Endpoint smoke success after deploy.
Last-Good Runtime Release Receipt
Section titled “Last-Good Runtime Release Receipt”The last-good receipt is the rollback input for deployed dev. It should be written only after endpoint smoke succeeds
for the current image pair.
Branch CD records the receipt automatically after a successful push-triggered deploy-endpoint run. Use the local command
only when manually repairing or reestablishing the rollback point after a known-good deploy.
Dry-run the receipt write:
pnpm wavemap -- deploy dev runtime record-release --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonWrite the receipt after smoke has passed:
pnpm wavemap -- deploy dev runtime record-release --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.json --execute --github-outputThe receipt records the source git SHA, immutable image tag, backend image URI, frontend image URI, deployment version, selected ref, and workflow run metadata. It lives on the runtime host at:
/opt/wavemap/deployment/last-successful-runtime-release.jsonDo not write this receipt for a deploy that has not passed endpoint smoke. Doing so would teach rollback to restore an unproven image pair.
Expected evidence:
- Receipt-write SSM command ID.
- Receipt path.
- Source SHA, deployment version, and backend/frontend image tags matching the smoke-passing deploy.
Runtime Rollback
Section titled “Runtime Rollback”Runtime rollback is an app-container rollback only. It reads the host-side last-good receipt and redeploys that recorded backend/frontend image pair through the existing runtime deploy document.
Rollback does not roll back:
- Database schema or rows.
- SSM runtime parameters.
- Media objects.
- Pulumi infrastructure.
- Destructive reset outcomes.
Use rollback when a runtime deploy or endpoint smoke failure leaves deployed dev on a bad app-runtime image pair and a
known-good receipt exists. On push-triggered develop deploys, the workflow attempts this automatically after a selected
runtime deploy or endpoint smoke failure.
Dry-run the rollback target:
pnpm wavemap -- deploy dev runtime rollback --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonExecute rollback:
pnpm wavemap -- deploy dev runtime rollback --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.json --execute --github-outputThen rerun endpoint smoke:
pnpm wavemap -- smoke devIf rollback smoke fails, keep the failed release visible in the workflow summary and choose the next operator action explicitly: forward fix, manual deploy of a known SHA, data reset, or infrastructure investigation.
Expected evidence:
- Rollback SSM command ID.
- Restored source SHA, image tag, deployment version, and receipt path.
- Endpoint smoke result after rollback.
- Clear note if rollback was skipped because no last-good receipt existed.
Endpoint Diagnostics
Section titled “Endpoint Diagnostics”Use this runbook before repair work when deployed dev is reachable but an application route appears slow, stuck, or
generically broken. The command captures public HTTP evidence only; it does not use AWS credentials, SSM, Docker,
Pulumi, or database access.
Capture baseline evidence:
pnpm wavemap -- deploy dev diagnostics endpointsInclude seeded artist details when seeded data should exist:
pnpm wavemap -- deploy dev diagnostics endpoints --seeded-artistUse a stable evidence directory when the transcript needs to be attached to a workflow note or investigation:
pnpm wavemap -- deploy dev diagnostics endpoints --evidence-dir /tmp/wavemap-endpoint-diagnosticsExpected evidence:
- Evidence directory path.
- Per-endpoint headers, body, curl timing metadata, and curl stderr files.
- HTTP status, response content type, CloudFront cache header, and backend
x-request-idwhen present. - Short JSON body previews for failing or diagnostic API routes.
Interpretation posture:
/en/pingpassing means CloudFront can reach the frontend container./api/v1/healthpassing means the frontend rewrite can reach the backend process./api/v1/readypassing means the backend can reach Postgres for a shallowSELECT 1check.- Application route failures after readiness passes usually point at app logic, schema compatibility, data shape, or downstream service behavior rather than a stopped host.
Database Migration Status And Repair
Section titled “Database Migration Status And Repair”Routine deploy-endpoint CD runs this same posture automatically after runtime deploy and before endpoint smoke:
read-only status first, typed gate decision second, migration-only repair only when the live DB is behind, then status
again before smoke. Manual use is still useful for incident repair, educational diagnosis, or custom deploy runs where
run_db_migrate=false should turn schema drift into an explicit failed gate.
Use this runbook when deployed dev is reachable but DB-backed routes fail with schema errors, or before a migration-only
repair. This path is narrower than reset: it compares migration state and can run pending migrations without seeding or
deleting rows.
Capture live stack outputs first when needed:
pnpm -C infra/pulumi exec pulumi stack output --stack aws-dev --json > /tmp/wavemap-aws-dev-outputs.jsonRun the read-only migration status check:
pnpm wavemap -- deploy dev database statusStatus reads the deployed API image’s Drizzle journal and the live database’s drizzle.__drizzle_migrations ledger, then
emits status markers such as:
WAVEMAP_DATABASE_STATUS_STATE=behindWAVEMAP_DATABASE_STATUS_EXPECTED_COUNT=21WAVEMAP_DATABASE_STATUS_APPLIED_COUNT=13WAVEMAP_DATABASE_STATUS_PENDING_COUNT=8Interpretation posture:
up-to-date: The live database matches the deployed API image’s migration journal.behind: The live database is missing migrations present in the deployed API image. Prefer migration-only repair.ahead: The database has migration ledger rows newer than this deployed image. Stop and avoid applying older code.hash-mismatchordrift: Migration history may have been rewritten or skipped. Stop and inspect before repair.ledger-missingorempty: Treat as a fresh or damaged DB state; do not assume migration-only repair is safe without reviewing the data posture.
CD converts those states into a gate decision before endpoint smoke:
pass: Status isup-to-date; endpoint smoke may run.migrate: Status isbehindand migration repair is enabled; rundatabase migrate, recheck status, then smoke.block: Status is missing,behindwithout migration permission, or any drift/ahead/history-risk state; stop for manual review.
Dry-run the migration-only command:
pnpm wavemap -- deploy dev database migrate --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonApply pending migrations only:
pnpm wavemap -- deploy dev database migrate --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.json --execute --github-outputThen verify:
pnpm wavemap -- deploy dev database statuspnpm wavemap -- deploy dev diagnostics endpointspnpm wavemap -- smoke devExpected evidence:
- Before/after database status output.
- SSM command ID for any
database migrate --executerun. - Endpoint diagnostics or smoke success proving the original failing route now works.
Database Reset
Section titled “Database Reset”Use this runbook only for deployed dev. The database is disposable, but reset is still an explicit destructive action.
Decision points before reset:
- Confirm the target is
https://dev.wavemap.app, Pulumi stackaws-dev, and deployment environmentdev. - Confirm losing non-seed database rows is acceptable.
- Confirm existing S3 media objects may outlive reset because reset does not delete the media bucket.
- Decide whether this is a normal disposable reset or a backup/restore learning drill.
- Prefer
deploy-seeded-browser,deploy-media, ordeploy-lifecycleover a custom stage combination unless the one-off shape is deliberate.
Dry-run the reset:
pnpm wavemap -- deploy dev database reset --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonExecute the reset:
pnpm wavemap -- deploy dev database reset --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.json --execute --github-outputThen prove the canonical seed baseline:
pnpm wavemap -- smoke dev seededThe reset runs migrations, base seed, and deterministic dev-data seed inside the deployed API container. Seeded smoke is the minimum proof that the canonical seeded route and API state exist again.
The canonical first seeded app route is:
/en/artist/ef839db3-ae41-4af9-9078-a8d211089962Known reset warning posture:
- Historical live reset proof reported event-series import warnings.
- Treat those warnings as seed-data cleanup work, not as a failed reset, when migrations, base seed, deterministic dev-data seed, and seeded smoke all pass.
- If the warning shape changes, treat it as fresh evidence.
Expected evidence:
- Database reset SSM command ID.
- Reset command success or failure log.
- Seeded smoke success.
- Follow-up media discrepancy report when DB/S3 divergence matters.
Media Discrepancy Report
Section titled “Media Discrepancy Report”Use this runbook after destructive reset, media smoke, manual media testing, or any investigation where database rows and S3 objects may have drifted.
Use Media Workflow And Validation for deciding whether a media change needs this report or a different proof lane.
The report is read-only. It should surface:
- Media rows whose canonical or thumbnail URL points at an S3 object that no longer exists.
- S3 media objects under the deployed-dev media prefix that are no longer referenced by any media row.
- Counts and sampled identifiers/keys without printing app secrets or signed credentials.
Dry-run the report:
pnpm wavemap -- deploy dev media discrepancy-report --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.jsonExecute the read-only report on the runtime host:
pnpm wavemap -- deploy dev media discrepancy-report --pulumi-outputs-json /tmp/wavemap-aws-dev-outputs.json --execute --github-outputThe live command sends an SSM command that runs the backend discrepancy report inside the deployed API container, so DB access and S3 listing use the rendered runtime environment. It does not delete rows or objects.
Non-zero discrepancy counts are telemetry, not automatic cleanup and not automatically a deploy failure. Cleanup remains a separate approval-gated action that should print the exact DB rows or object keys it will touch before mutation.
Expected evidence:
- Media discrepancy SSM command ID.
- Report scope.
- Inspected row/object counts.
- Total discrepancy count and discrepancy-kind summary.
- Any sampled rows or object keys needed for follow-up.
Evidence And Failure Triage
Section titled “Evidence And Failure Triage”Use workflow summaries first. Deployed-dev workflow summaries and job summaries should identify the stable facts an operator needs without turning the docs site into a run archive:
- Selected profile and active add-ons.
- Selected ref, resolved commit SHA, and deployment version or image tag.
- Resolved stages.
- Runtime deploy, reset, rollback, release-record, and discrepancy SSM command IDs when those jobs run.
- Smoke result and elapsed time.
- First job log or artifact to inspect on failure.
Browser-style jobs upload Playwright artifacts on failure. Database reset and media smoke lanes upload failure-only logs when selected. Keep raw logs, command outputs, and live identifiers in private workflow artifacts unless they are intentionally sanitized for docs.
Use this routing rule when deciding where evidence belongs:
| Evidence Kind | Durable Home |
|---|---|
| Stable procedure, expected evidence, mutation boundary, failure triage, or redaction rule. | Curated docs under apps/wavemap-docs/src/content/docs. |
| Unsettled experiment, proof interpretation, temporary timing adjustment, or implementation log. | Working notes under apps/wavemap-docs/working-notes. |
| Change-specific proof that a PR, deploy, or workflow run behaved correctly. | Pull request description, workflow summary, or private workflow/job summary. |
| Raw command output, Lambda payload, browser trace, screenshot, downloaded artifact, or log. | Private artifacts with appropriate retention, unless intentionally sanitized before publication. |
| Run identifier, timestamp, commit SHA, SSM command ID, instance ID, public IP, or proof URL. | Private evidence, PR/workflow context, or working note while it is actively useful for follow-up. |
A live proof graduates to curated docs only after it has been reduced to the reusable lesson. For example, document that cold-start browser proof should capture the shutdown response, cold-start precheck HTML, Playwright failure artifacts, and final smoke result. Do not publish the specific workflow run number, EC2 instance ID, SSM command UUID, public IP, or timing measurement unless that identifier itself is part of a reviewed operator decision.
When evidence changes an operator path, update the runbook. When evidence only proves a single run, keep it with the run.
Publishing Policy
Section titled “Publishing Policy”Publish only reviewed operator paths, expected evidence rubrics, and sanitized examples that teach a stable pattern.
Do not publish:
- Raw logs, command output, Lambda responses, browser traces, screenshots, or downloaded artifacts.
- One-off proof narratives that do not change future operator behavior.
- Live cloud identifiers such as instance IDs, public IPs, SSM command UUIDs, or temporary proof URLs.
- Workflow run numbers, commit SHAs, and timestamps unless they are intentionally part of a historical decision record.
- Temporary proof acceleration details such as shortened inactivity windows.
Working notes may keep this material while a decision is still moving. Once the decision settles, promote the procedure, evidence expectation, or redaction rule into curated docs and leave the noisy proof trail in PRs, workflow summaries, or private artifacts.