# Setup runbook: Wiki.js on EKS

Exact layer apply sequence, prerequisites, and validation checks.

## Prerequisites

Before running any layer, ensure the following are in place:

- **AWS resources:** Domain (or subdomain), Route 53 hosted zone, ACM certificate in the deployment account (same region as the ALB), and two IAM roles: a GitHub OIDC role in the deployment account and an assumable role in the domain account for 01-dns-main.
- **For detailed steps and copy-paste AWS CLI commands** to create these resources, see [Prerequisites](prerequisites.md).
- **Roles:** (1) **deployment_account_role_arn** - role to assume for the deployment account (where all resources are deployed); most layers ask only for this. (2) **domain_account_role_arn** - role in the account that has the domain/hosted zone; used only by 01-dns-main to create the DNS role there. (3) **dns_assume_role_arn** - the DNS role created by 01-dns-main (published as `dns_role_arn` in Parameter Store); assumed by the deployment role in layer 50 for Route 53. **01-dns-main** asks for **domain_account_role_arn** and **deployment_account_role_arn** (the workflow assumes the latter to run Terraform).
- **Terraform variables:** Each layer has a `terraform.tfvars` with required values. Set `env`, `region`, and `prefix` (e.g. `wikijs`) per layer; prefix is used for Parameter Store paths and resource naming (no hardcoded prefix in code). Bootstrap expects `domain_name`, `wikijs_fqdn`, `hosted_zone_id`, and `acm_cert_arn` from the prerequisites (see [Prerequisites](prerequisites.md)); it writes these to Parameter Store under `<prefix>/<region>/<env>/00-bootstrap/`. **Resource naming convention:** All named resources follow `${var.prefix}-${var.region}-<name>-${var.env}`; the `<name>` part is set per resource in `terraform.tfvars` (e.g. `tfstate_name`, `dns_assume_role_name`, `vpc_name`). See [NAMING_CONVENTION.md](../plan-and-requirements/NAMING_CONVENTION.md) for the full convention and per-layer name variables.
- For **01-dns-main:** Do **not** put `domain_account_role_arn` or `deployment_account_role_arn` in terraform.tfvars. When dispatching tf-01-dns-main-provision or tf-01-dns-main-destroy you are asked for **two** role inputs: **domain_account_role_arn** (role in the account that has the domain/hosted zone) and **deployment_account_role_arn** (the deployment role; the workflow also assumes this role to run Terraform). Values are passed to Terraform via workflow inputs → extra_tf_vars. Layer reads `hosted_zone_id` from Parameter Store (bootstrap).

## Layer apply order

1. **00-bootstrap** - Remote state bucket, KMS keys, Parameter Store contract. Must be applied first.
2. **01-dns-main** - Creates the IAM DNS role (**dns_assume_role_arn**) in the domain account (where domain and hosted zone are). That role has a trust relationship allowing **deployment_account_role_arn** to assume it; publishes `dns_role_arn` to Parameter Store. Apply after bootstrap so layer 50 (running as the deployment role) can assume **dns_assume_role_arn** to create the Route 53 record.
3. **10-network** - VPC, subnets (public, private, database), security groups (ALB, workload, RDS, endpoints), and VPC endpoints (S3 gateway; ECR, STS, Secrets Manager, SSM, CloudWatch Logs). Publishes `vpc_id`, subnet IDs (JSON), and SG IDs to Parameter Store under `<prefix>/<region>/<env>/10-network/`.
4. **20-eks** - EKS Auto Mode cluster. Reads `vpc_id`, `private_subnet_ids`, `workload_sg_id`, `alb_sg_id` from Parameter Store (10-network). Publishes `cluster_name`, `oidc_issuer_url`, `oidc_provider_arn`, `cluster_security_group_id`, `node_security_group_id`, `cluster_endpoint` under `<prefix>/<region>/<env>/20-eks/`.
5. **30-data-rds** - RDS PostgreSQL for Wiki.js. Reads `database_subnet_ids` and `rds_sg_id` from Parameter Store (10-network). Creates RDS PostgreSQL (Multi-AZ, storage encryption, automated backups, deletion protection enabled by default, managed master password in Secrets Manager). Publishes `endpoint`, `port`, `db_name`, `secret_arn` under `<prefix>/<region>/<env>/30-data-rds/`.
6. **35-storage-s3-assets** - S3 assets bucket (SSE-KMS, versioning, BPA, lifecycle) and IRSA role for Wiki.js. Reads `oidc_issuer_url`, `oidc_provider_arn`, `cluster_name` from Parameter Store (20-eks). Publishes `bucket_name`, `bucket_arn`, `kms_key_arn`, `wikijs_irsa_role_arn`, `bucket_prefix` under `<prefix>/<region>/<env>/35-storage-s3-assets/`.
7. **40-platform** - Cluster add-ons and namespaces. Reads `cluster_name` from Parameter Store (20-eks). Provisions namespaces (`argocd`, `wikijs`), EKS addons (aws-ebs-csi-driver, aws-secrets-store-csi-driver), Secrets Manager provider (Helm), Fluent Bit for CloudWatch Logs (Helm), and an EBS StorageClass. Publishes `argocd_namespace`, `wikijs_namespace`, `secrets_store_csi_addon_version`, `storage_class_name` under `<prefix>/<region>/<env>/40-platform/`.
8. **45-argocd** - ArgoCD installation. Reads `cluster_name` from Parameter Store (20-eks) and `argocd_namespace` from Parameter Store (40-platform). Installs Argo CD via the official Helm chart (argo-cd) into the argocd namespace (internal-only by default: ClusterIP, no Ingress). **Admin credentials:** By default Terraform auto-generates username and password, stores them in a new Secrets Manager secret, and syncs them into the cluster; retrieve credentials via AWS Console or CLI using the secret ARN published to Parameter Store at `45-argocd/argocd_admin_credentials_secret_arn`. **Optional external UI:** Set `argocd_server_fqdn` (e.g. `argocd.wikijs.talorlik.com`) in `terraform.tfvars` to expose the Argo CD admin UI on a subdomain; the layer then enables Ingress with ALB and TLS (reads `hosted_zone_id`, `acm_cert_arn` from 00-bootstrap and `dns_role_arn` from 01-dns-main) and optionally creates a Route 53 A record (may require a second apply after the Ingress/ALB exists). Optionally syncs repo credentials from Secrets Manager into the argocd namespace via the platform Secrets Store CSI driver (SecretProviderClass). Publishes `argocd_namespace`, `argocd_server_url`, and when admin credentials are from Secrets Manager, `argocd_admin_credentials_secret_arn` under `<prefix>/<region>/<env>/45-argocd/`.
9. **50-app-wikijs** - Wiki.js via ArgoCD, ingress, and the Route 53 A (alias) record for the Wiki.js FQDN (created in the domain account by assuming **dns_assume_role_arn** from 01-dns-main). Layer 50 **retrieves `dns_role_arn` from Parameter Store** (01-dns-main) during plan/apply; no workflow input required. Creates the ArgoCD Application pointing to `apps/wikijs` (Helm wrapper for Requarks Wiki.js chart), SecretProviderClass for RDS and Wiki.js app secrets, and (when the ALB exists) the Route 53 record and Parameter Store outputs. **Two-apply note:** The ALB is created by EKS Auto Mode only after ArgoCD syncs the Wiki.js Ingress. On the first apply, the ArgoCD Application and SecretProviderClass are created; the Route 53 record and SSM outputs (`alb_dns_name`, `alb_hosted_zone_id`, `application_url`) are created only when the ALB exists. After ArgoCD has synced Wiki.js and the ALB is created, run the layer 50 provision workflow **again** (Option A: trigger tf-50-app-wikijs-provision; Option B does not re-run automatically, so trigger tf-50-app-wikijs-provision once manually) to create the Route 53 record and publish the outputs.

## How to apply

You can run provisioning in one of two ways. The **layer apply order** (00 → 01 → … → 50) below is used by **both** options: Option A runs each layer manually in that sequence; Option B runs them automatically in order.

### Option A - One by one

Run each layer with its own workflow in this order:

1. **tf-00-bootstrap-provision** - Inputs: env, region, **deployment_account_role_arn**.
2. **tf-01-dns-main-provision** - Inputs: env, region, **deployment_account_role_arn**, **domain_account_role_arn**.
3. **tf-10-network-provision** - Inputs: env, region, **deployment_account_role_arn**.
4. **tf-20-eks-provision**, **tf-30-data-rds-provision**, **tf-35-storage-s3-assets-provision**, **tf-40-platform-provision**, **tf-45-argocd-provision**, **tf-50-app-wikijs-provision** - Each: env, region, **deployment_account_role_arn**.

Workflow files live under `.github/workflows/` (e.g. `tf-01-dns-main-provision.yaml`). Trigger via **Actions** → select the workflow → **Run workflow**, or on push to the layer path (defaults: dev, us-east-1).

### Option B - All in one

Run **tf-all-provision** once. You provide a single set of inputs: **env**, **region**, **deployment_account_role_arn**, **domain_account_role_arn**. The workflow runs layers 00 → 01 → … → 50 in sequence. Ensure the same env, region, and prefix are used in all layers' `terraform.tfvars` (or CI defaults); the workflows default prefix to `wikijs`.

### Backend and workspaces

- Backend config for non-bootstrap layers is built from Parameter Store (`<prefix>/<region>/<env>/00-bootstrap/tfstate_bucket` and `tfstate_kms_key_arn`; prefix defaults to `wikijs` in CI). Workspace `<region>-<env>` is selected after init.
- If you use a different **prefix** in `terraform.tfvars`, ensure it matches in bootstrap and that the all-in-one workflow uses that prefix if supported (currently workflows default to `wikijs`).

## Validation

- **Bootstrap:** Parameter Store bootstrap contract exists under `<prefix>/<region>/<env>/00-bootstrap/*` (tfstate_bucket, tfstate_kms_key_arn, domain_name, wikijs_fqdn, hosted_zone_id, acm_cert_arn).
- **Backend:** Terraform backend initializes with S3 state and `use_lockfile=true` for every non-bootstrap layer.
- **After layer 01-dns-main:** Parameter Store key `<prefix>/<region>/<env>/01-dns-main/dns_role_arn` exists (layer 50 retrieves it from here).
- **After layer 10-network:** Parameter Store keys under `<prefix>/<region>/<env>/10-network/` exist (vpc_id, public_subnet_ids, private_subnet_ids, database_subnet_ids, alb_sg_id, workload_sg_id, rds_sg_id, endpoint_sg_id).
- **After layer 20-eks:** Parameter Store keys under `<prefix>/<region>/<env>/20-eks/` exist (cluster_name, oidc_issuer_url, oidc_provider_arn, cluster_security_group_id, node_security_group_id, cluster_endpoint).
- **After layer 30-data-rds:** Parameter Store keys under `<prefix>/<region>/<env>/30-data-rds/` exist (endpoint, port, db_name, secret_arn). RDS is reachable only from the cluster security group; managed master password secret ARN is in Parameter Store (no plaintext in state).
- **After layer 35-storage-s3-assets:** Parameter Store keys under `<prefix>/<region>/<env>/35-storage-s3-assets/` exist (bucket_name, bucket_arn, kms_key_arn, wikijs_irsa_role_arn, bucket_prefix). S3 assets bucket meets SSE-KMS, versioning, block public access, and lifecycle rules; Wiki.js service account can access via IRSA.
- **After layer 40-platform:** Parameter Store keys under `<prefix>/<region>/<env>/40-platform/` exist (argocd_namespace, wikijs_namespace, secrets_store_csi_addon_version, storage_class_name).
- **After layer 45-argocd:** Parameter Store keys under `<prefix>/<region>/<env>/45-argocd/` exist (argocd_namespace, argocd_server_url; when admin credentials are from Secrets Manager, also argocd_admin_credentials_secret_arn).
- **After layer 50-app-wikijs (first apply):** ArgoCD Application and SecretProviderClass exist; Route 53 record and SSM outputs may be absent until ALB exists (see two-apply note above).
- **After layer 50-app-wikijs (second apply, once ALB exists):** Parameter Store keys under `<prefix>/<region>/<env>/50-app-wikijs/` exist (alb_dns_name, alb_hosted_zone_id, application_url).
- **After full stack:** EKS reachable; OIDC/IRSA functional; RDS and S3 accessible from cluster; ArgoCD synced; Wiki.js reachable over HTTPS on the dedicated hostname; monitoring, logging, and alerting present (e.g. Fluent Bit, EKS control-plane logs) to satisfy observability.

### Quick checks

To verify Parameter Store after a layer, use the AWS CLI (replace `<prefix>`, `<region>`, `<env>` with your values):

```bash
aws ssm get-parameter --name "/<prefix>/<region>/<env>/00-bootstrap/tfstate_bucket" --query "Parameter.Value" --output text
```

See [architecture](../architecture/architecture.md) and [security considerations](../security/security-considerations.md) for post-deploy verification.

## GitHub Environments

Each workflow job uses a GitHub Environment of the form **tf-&lt;env&gt;-&lt;layer&gt;** (e.g. `tf-dev-00-bootstrap`, `tf-dev-50-app-wikijs`). Create these in **Settings** → **Environments** if you want to use required reviewers to gate destroy. See [teardown](teardown.md) for destroy confirmation.
