CIS490/bootstrap/app.py
max a93a3ff221 bootstrap: auto-issue mTLS leaves to enrolled lab hosts (closes #9, refs #3)
Adds a pull-based cert distribution path so install-lab-host.sh can
fetch its own leaf cert without operator intervention. Removes the
ssh-from-Pi requirement that blocked elliott-lab.

How the chicken-and-egg gets solved: a freshly wg-enrolled lab host
already has WG access (gate kept by iptmonads at L4) and trusts the
Caddy local CA (bundled in this repo at etc/caddy-root.crt). It
makes a single TLS call to https://bootstrap.wg/v1/cert/<host_id>
— no mTLS — gets back a tar of {ca.crt, leaf.pem, leaf.key},
extracts to /etc/cis490/certs/, and the shipper unblocks. Trust
boundary is "reached :443 over WG"; no operator action needed.

bootstrap/
  app.py        Starlette: GET /v1/cert/{host_id}, GET /v1/health.
                Validates host_id charset, rate-limits per source IP,
                logs every mint with the X-Real-IP Caddy injects.
  __main__.py   uvicorn launcher; runs as root because the wg-pki CA
                private key is root-only.

etc/cis490-bootstrap.service
  systemd unit on 127.0.0.1:8446 with ProtectSystem=strict +
  narrow ReadWritePaths=/var/lib/wg-pki. ProtectHome=no because
  systemd's read-only mode hides /home contents (the issuer script
  the wrapper exec's lives there).

scripts/issue-cis490-client-cert-wrapper.sh
  Adapter the bootstrap service shells out to. Resolves the actual
  wg-pki issuer script across the three plausible install layouts
  (/opt/wg-pki, /home/max/wg-pki, /home/max/.env/wg-pki) so a single
  copy of the unit file works on any operator's box. Forces
  --out-dir to /var/lib/wg-pki/issued so writes stay inside the
  service's narrow ReadWritePaths.

scripts/install-lab-host.sh
  After scaffolding lab-host.toml, if /etc/cis490/certs/lab-host.pem
  is absent, curls bootstrap.wg with --cacert etc/caddy-root.crt
  (no chicken-and-egg), extracts, chowns/chmods. Skips silently if
  bootstrap.wg is unreachable so manual hand-carry remains possible.

scripts/install-receiver.sh
  Drops cis490-bootstrap.service alongside cis490-receiver and
  prints both as "enable --now" candidates. cis490-bootstrap is the
  thing that makes lab hosts self-provisioning.

etc/caddy-root.crt
  Bundled copy of wg-pki's published Caddy local CA root, so the
  bootstrap fetch can verify TLS without depending on a wg-pki
  clone that may or may not be on the lab host yet.

Verified live on the Pi:
  $ curl --cacert etc/caddy-root.crt https://bootstrap.wg/v1/cert/elliott-lab -o /tmp/x.tar
  HTTP 200 size=10240
  $ tar tf /tmp/x.tar
  ca.crt
  elliott-lab.key
  elliott-lab.pem
  $ openssl verify -CAfile … elliott-lab.pem
  /tmp/.../elliott-lab.pem: OK
  $ openssl x509 -subject … -noout
  subject=CN=elliott-lab

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 01:30:29 -05:00

146 lines
5.3 KiB
Python

"""``cis490-bootstrap`` — auto-issue mTLS leaf certs to enrolled lab hosts.
This is the chicken-and-egg fix for first-time lab-host setup. A
freshly wg-enrolled device has WG access (and trusts the wg-pki CA)
but has no client cert yet, so it can't authenticate to the
mTLS-protected ``collector.wg``. This service exposes a *plain-TLS*
(no client-auth) endpoint that the lab host can call once during
``install-lab-host.sh`` to retrieve its leaf cert tarball.
Trust boundary: anything that reaches ``bootstrap.wg`` has already
passed iptmonads' WG-membership check at L4. No further
authentication is required for the bootstrap pull — by the time a
caller can connect at all they're a peer the operator authorized.
The privilege boundary, on the other hand, is real: minting certs
requires the wg-pki CA private key (root-only at
``/var/lib/wg-pki/cis490-client-ca/ca.key``). This service therefore
runs as root in a tight sandbox (see ``etc/cis490-bootstrap.service``)
and shells out to ``issue-cis490-client-cert.sh`` for each mint.
Endpoints:
GET /v1/cert/{host_id} — return tarball of {ca.crt, leaf.pem, leaf.key}
for ``host_id``. Cached — successive calls
return the same bytes.
GET /v1/health — liveness probe (no auth needed).
Each mint is logged with the source IP (after Caddy's X-Real-IP
forward) so the operator has an audit trail of which devices have
fetched which certs.
"""
from __future__ import annotations
import logging
import re
import subprocess
import time
from pathlib import Path
from typing import Awaitable, Callable
from starlette.applications import Starlette
from starlette.requests import Request
from starlette.responses import FileResponse, JSONResponse, Response
from starlette.routing import Route
log = logging.getLogger("cis490.bootstrap")
# Sane host_id charset — same rules the receiver enforces, mirrored
# here so mint requests can't smuggle path traversal in.
_HOST_ID_RE = re.compile(r"^[A-Za-z0-9_.-]{1,64}$")
def _is_valid_host_id(s: str) -> bool:
return bool(_HOST_ID_RE.match(s))
def make_app(
*,
issuer_script: Path,
issued_root: Path,
rate_limit_window_s: float = 5.0,
) -> Starlette:
"""Build the Starlette app. Wired by the production launcher in
``bootstrap/__main__.py``; tests can pass synthetic paths."""
issued_root.mkdir(parents=True, exist_ok=True)
# Coarse per-IP rate limiter to make a casual scan annoying. Not
# a real defense — the WG mesh is the actual perimeter.
last_request: dict[str, float] = {}
async def health(request: Request) -> Response:
return JSONResponse({"status": "ok"})
async def get_cert(request: Request) -> Response:
host_id: str = request.path_params["host_id"]
if not _is_valid_host_id(host_id):
return JSONResponse({"error": "bad host_id"}, status_code=400)
# Caddy forwards the original WG-side IP via X-Real-IP /
# X-Forwarded-For; fall back to the direct peer if running
# without Caddy in front (tests).
src = (
request.headers.get("x-real-ip")
or (request.headers.get("x-forwarded-for") or "").split(",")[0].strip()
or (request.client.host if request.client else "?")
)
now = time.monotonic()
prev = last_request.get(src, 0.0)
if (now - prev) < rate_limit_window_s:
return JSONResponse(
{"error": "rate limited; back off"},
status_code=429,
)
last_request[src] = now
tar_path = issued_root / host_id / f"{host_id}.tar"
if not tar_path.exists():
log.info("minting cert for host_id=%s src=%s", host_id, src)
try:
subprocess.run(
[
str(issuer_script), host_id,
"--out-dir", str(issued_root / host_id),
],
check=True,
capture_output=True,
text=True,
timeout=30,
)
except subprocess.CalledProcessError as e:
log.error("issue script failed for %s: rc=%d stderr=%s",
host_id, e.returncode, e.stderr[:500])
return JSONResponse(
{"error": "mint failed", "detail": e.stderr[:500]},
status_code=500,
)
except (OSError, subprocess.TimeoutExpired) as e:
log.exception("issue script transport error for %s", host_id)
return JSONResponse(
{"error": f"transport: {e}"},
status_code=500,
)
else:
log.info("cache hit for host_id=%s src=%s", host_id, src)
if not tar_path.exists():
return JSONResponse({"error": "tarball not produced"}, status_code=500)
return FileResponse(
tar_path,
media_type="application/x-tar",
filename=f"{host_id}.tar",
headers={
"X-Cis490-Host-Id": host_id,
"X-Cis490-Cert-Source-IP": src,
},
)
routes = [
Route("/v1/health", health, methods=["GET"]),
Route("/v1/cert/{host_id}", get_cert, methods=["GET"]),
]
return Starlette(routes=routes)